Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: ML for Lean: How to do it?

Jason Rute (Jan 18 2020 at 22:46):

I want to start the conversation of how to enable machine learning for Lean 3 and Lean 4. Of course there isn't just one application of machine learning or one type of machine learning. Right now, I can see two immediate applications/connections of machine learning and Lean:

Extending HOList to Lean. I’ve already started a document here on what this would take (and if we are serious about doing this, the N2Formal team at Google AI would probably be willing to provide some assistance). At first this would probably be more of a machine learning benchmark than a useable system, but I think we could also incorporate it into (a forked version of) Lean with some work.
Adding machine learning premise selection to the Lean hammer project

Here are the four areas where machine learning will eventually need to interface with Lean.

At the tactic level. If we are to ever use ML from within Lean, there would be to have tactics which apply machine learning under the hood. The tactics could behave like simp and close the goal, or like library_search and suggest a proof which doesn’t use machine learning. Both have their uses. @Gabriel Ebner has already implemented something like this in Lean 3 with his Lean hammer tactic. The tactic uses C++ for premise selection which means it could interface with TensorFlow or PyTorch instead. However, it also sounds like to do this he had to hack Lean to make it work. So it sounds like we know how to do this in Lean 3, and hopefully it will be easier in Lean 4 (where we don’t have to fork Lean to make this work). However, one probably needs to document this more to see what is possible.
Theorem dependency recording and statistics. For some applications of machine learning, it is important to have training data. The minimum needed for something like HOList is a list of theorems statements (and either their dependencies or a linear order of theorems where the dependencies come before the theorem which uses it). For the hammer premise selection, one needs something similar, the list of premises used for the proof of each theorem (and for further training, the premises actually used in the hammer proof). It sounds like (at least for the hammer project) @Gabriel Ebner has a script which generates this information. It might not be in the form needed by the HOList project however. Others have talked about how Lean stores theorems in the “environment”, although I don't understand the details. So it sounds like we have at least one way to do this in Lean 3 if not more. Again, the question remains of what form this information is available in and what needs it meets. I wonder if this will be easier in Lean 4.
(Optional) Full proof recording at the tactic level For supervised learning of tactics (and again this is NOT needed for HOList, but could be a nice to have), one needs recorded tactic proofs (so one needs the tactics used and their parameters). I have no idea if there is any work done here and if this is easy or hard.
Outside communication and control of Lean’s tactic framework For a system like DeepHOL/HOList, one needs to (from within Python preferably) control a tactic search inside of Lean. Let me describe how this currently works in DeepHOL/HOList. A forked HOL Light version is put in a docker container. This container also contains a gRPC server which takes three request types. These requests are sent to the HOL Light docker container server from another docker container which handles the proof guidance and training.) The three requests are as follows: ApplyTactic says “apply [tactic] with [parameters] to [goal]." Then the server tries that in HOL Light and if it succeeds, it returns a new list of goals. VerifyProof says “check if [list of tactics with parameters] solves [theorem/goal]." Last, RegisterTheorem registers a theorem so that it can be called later by the system. (Technically, in HOList it has to be the most recently verified theorem. Also, it returns an integer fingerprint of the theorem.) For a more detailed descriptions of how these three requests work, see here. For DeepHOL/HOList or a similar system we would probably need to build something like this communication interface. (For DeepHOL/HOList, we would probably need to implement these three commands, but it might be good to step back and think about how this could be done to support other systems besides DeepHOL.) Some have mentioned there is already Python bindings for Lean which does something like this. Others have mentioned that the LSP handles stuff like this. It would be great to discuss this more concretely. Is there a clear path to implementing something like this in Lean 3? (It doesn’t need to be that fast, so we should try the easiest to implement idea first I think.) Will this be easier in Lean 4?

Jason Rute (Jan 20 2020 at 02:54):

I looked a bit into Daniel's old Python bindings for Lean. They aren't documented, but the test code gives a hint at what they can do. It looks promising. However, I can't get them installed correctly. (It is probably my own not understanding C++ and other things involved in the install.) Someone suggested also that the bindings may not work with current Lean. Has anyone got them to work recently?

Jason Rute (Jan 20 2020 at 03:09):

Also, I think I said something wrong about LSP. I suggested that Lean 3 uses LSP. That will be Lean 4 I guess. Lean 3 I guess can communicate with lean --server by sending JSON messages via STDIN and reading JSON responses from STDOUT. It seems that one can deduce the Lean 3 message format from the lean-client-js-core package. Honestly, this is getting outside my wheelhouse, so I would have to do a lot of digging and learning to figure out how this all works. However, if I do, I'll try to write some descriptions of what is possible here and if it fits the needs of something like DeepHOL.

Bryan Gin-ge Chen (Jan 20 2020 at 03:13):

Feel free to ask here if you've got questions about Lean 3's server mode!

Jesse Michael Han (Jan 20 2020 at 03:30):

you can observe the Lean 3 server protocol in Emacs by enabling M-x lean-turn-on-debug-mode

Bryan Gin-ge Chen (Jan 20 2020 at 03:34):

You can also see it action in the community web editor if you click the question mark and scroll down to the "Debug settings" and then open your browser's console.

Jason Rute (Jan 21 2020 at 00:37):

Taking @Bryan Gin-ge Chen's advice, I looked at Lean's server mode through the community web editor. First, I could only get it to work on Firefox (it didn't work on Safari or Chrome on my Mac). Second, when @Bryan Gin-ge Chen says my "browser's console", that means go to Tools > Web Developer > Web Console.

Jason Rute (Jan 21 2020 at 00:37):

From my limited experimentation, there appears to be three requests one can make of the server, "info", "sync", and "complete". I'm still trying to figure out what these three do. I should probably next try to interact directly with the Lean3 server via python. Until I do that, it appears that "sync" does most of the heavy lifting. It appears to send the whole file contents.

{
  "command": "sync",
  "file_name": "/test.lean",
  "content": "example (m n : ℕ) : m + n = n + m :=\nby refl",
  "seq_num": 40
}

And then one gets back responses about the state of the file, including messages about tactics which don't work.

{
  "msgs": [
    {
      "caption": "",
      "file_name": "/test.lean",
      "pos_col": 3,
      "pos_line": 2,
      "severity": "error",
      "text": "invalid apply tactic, failed to unify\n  m + n = n + m\nwith\n  ?m_2 = ?m_2\nstate:\nm n : ℕ\n⊢ m + n = n + m"
    }
  ],
  "response": "all_messages"
}

The "info" command seems for getting more information about variable types, squiggly lines, and other information displayed in the side bar. I think the "complete" command is for autocompletion. (Again, using a Python (or other) client to manually communicate with lean --server would make all this more clear.)

Jason Rute (Jan 21 2020 at 00:38):

If the "sync" command is the natural way to use try out tactics on a goal via Lean3 server, that would be a pretty blunt tool, but could be workable. One could enter the goal with a hole (e.g. example (m n : ℕ) : m + n = n + m := begin end). Then one can try filling in the hole with various tactics and see what the responses are, including the new goal state (e.g. enter example (m n : ℕ) : m + n = n + m := begin induction n end). However, it seems that the agent would have to interact with the actual Lean code and parse the pretty printed human readable responses (e.g. the agent would have to parse: "tactic failed, there are unsolved goals\nstate:\n2 goals\ncase nat.zero\nm : ℕ\n⊢ m + 0 = 0 + m\n\ncase nat.succ\nm n_n : ℕ,\nn_ih : m + n_n = n_n + m\n⊢ m + nat.succ n_n = nat.succ n_n + m"). This isn't impossible. There are only a few message templates the agent would have to deal with and possibly one could turn off some pretty printers to make the formulas more machine readable. Nonetheless, I think something like the Python bindings look more promising.

Jason Rute (Jan 21 2020 at 00:38):

Am I understanding this correctly, @Bryan Gin-ge Chen?

Jason Rute (Jan 21 2020 at 01:02):

With pretty printing, the goal mentioned above is:

tactic failed, there are unsolved goals
state:
2 goals
case nat.zero
m : ℕ
⊢ m + 0 = 0 + m
case nat.succ
m n_n : ℕ,
n_ih : m + n_n = n_n + m
⊢ m + nat.succ n_n = nat.succ n_n + m

After setting set_option pp.all true, it becomes

tactic failed, there are unsolved goals
state:
2 goals
case nat.zero
m : nat
⊢ @eq.{1} nat (@has_add.add.{0} nat nat.has_add m nat.zero) (@has_add.add.{0} nat nat.has_add nat.zero m)

case nat.succ
m n_n : nat,
n_ih : @eq.{1} nat (@has_add.add.{0} nat nat.has_add m n_n) (@has_add.add.{0} nat nat.has_add n_n m)
⊢ @eq.{1} nat (@has_add.add.{0} nat nat.has_add m (nat.succ n_n)) (@has_add.add.{0} nat nat.has_add (nat.succ n_n) m)

which is fairly machine parsable (on a level similar to the s-expressions used by HOList/DeepHOL). Given that tools like HOList don't need the fastest server, it would be interesting to see if something like the HOList server interface could be built from the Lean server.

Bryan Gin-ge Chen (Jan 21 2020 at 01:10):

Most of what you wrote about the commands looks correct to me. There are actually a few more commands like "search" and "hole", that are used for text editor integration. If you can read typescript, you can also get an idea of the interface for the server commands by browsing this and this from lean-client-js-core.

However, it seems that the agent would have to interact with the actual Lean code and parse the pretty printed human readable responses

I'm not a Lean metaprogramming expert, but I think it should be possible to write meta code which will return whatever info you want in machine-readable form as well. For instance, the info in the error messages you quoted is available from tactic.local_context. See mathlib's "tactic writing" tutorial for some more info.

The hole commands might also be interesting to you, since they let you trigger Lean meta code at positions in the code ("holes") that are surrounded by {! and !}. Among other things, you can write meta code which returns a string in the "replacements" field of the response message; you might be able to use this to query Lean for something you're interested in and return it in JSON or some other format. Here's some docs on the hole commands provided by mathlib.

Jason Rute (Jan 21 2020 at 01:19):

Ok interesting. Thanks!

Reid Barton (Jan 21 2020 at 01:51):

From what I recall, if you've used Lean from inside VS Code/emacs, you've pretty much directly experienced all the functionality the server mode has to offer

Jason Rute (Jan 21 2020 at 02:03):

The more I think about it, this VS Code/emacs functionality is probably all that is needed to reproduce something like the HOList interface. I think the only added thing would be to use a small amount of custom meta programming to get the outputs (such as the current goal state after applying a tactic) in the most useful form. (And even this is optional since the current text outputs are fairly useful and parsable.) The only worries I have is that (1) it is not fast enough, and (2) the idea I have of how to implement it is too hacky. Maybe I'll sketch out my idea soon since it is pretty straightforward.

Jason Rute (Jan 21 2020 at 12:41):

Here is how one could implement something very similar to HOList’s API but for Lean, using the Lean server. (As with the HOList API, I am assuming tactics are only applied to single goals and not to lists of goals.)

Jason Rute (Jan 21 2020 at 12:41):

The HOList API has an ApplyTactic request which sends a goal and a tactic command (with parameters). The response is either a success or failure. A success returns the new goal state. In the Lean Server (ignoring right now the details of how to represent formulas), all I would have to do is take my goal:

m n_n : ℕ,
n_ih : m + n_n = n_n + m
⊢ m + nat.succ n_n = nat.succ n_n + m

and my tactic (plus parameters) induction m.
Next, I convert this all to an example:

example (m n_n : ℕ)
        (n_ih : m + n_n = n_n + m)
        : m + nat.succ n_n = nat.succ n_n + m :=
begin
induction m
end

I put this in a file and run sync on it.

Jason Rute (Jan 21 2020 at 12:42):

Implementing HOList’s VerifyProof request is even more straight forward. It is given a theorem (e.g. ⊢ ∀ m n : nat, m + n = n + m) and a proof which is a list of tactics (with parameters) (e.g. [intro, intro, simp]) and returns whether that proof succeeds. Just put it into a file and check it with the server:

example : ∀  m n : nat, m + n = n + m :=
begin
intro;
intro;
simp
end

Jason Rute (Jan 21 2020 at 12:42):

Finally there is HOList’s RegisterTheorem. If one is using it to register a theorem (e.g. (m n_n : ℕ) ⊢ m + n = n + m) one just needs to enter the theorem into Lean with a unique identifier. If one is ok with sorries, then

theorem thm_1862938746298 (m n_n : ℕ) : m + n = n + m := sorry

If like HOList, one wants to use the proof from VerifyProof (see my documentation of the HOList API), then use the proof (just [simp] in this case):

theorem thm_1862938746298 (m n_n : ℕ) : m + n = n + m :=
begin
simp
end

These registered theorems should probably go in another file so they don’t have to be checked every time.

Jason Rute (Jan 21 2020 at 12:42):

Also, there is adding definitions via RegisterTheorem. Again, this seems straight forward.

Jason Rute (Jan 21 2020 at 12:42):

It think the next thing would be to try this out (calling the Lean server as above) in a simple Python client to make sure it works.

Jason Rute (Jan 21 2020 at 12:42):

I’m not saying this will 100% fulfill the HOList API since that was intended for HOL Light, but it will get 90% of the way there. Then we can have a discussion about the other 10%. (Ok, maybe that remaining 10% will take 90% of the time. One still needs to catalog the Lean tactics and their parameters for example.)

Christian Szegedy (Jan 21 2020 at 21:14):

We would be super happy to help supporting Lean from HOList. End of last October I asked Daniel Selsam his opinion of integration.
He suggested us to wait until Lean 4 is ready (suggested that it would take several months on their side to get there) .

I think that given the relative simplicity of our interface and the HOL Light interface as a template, it should not be a huge amount of work to integrate Lean with HOList and our team would be very happy to support this integration on our side.

Jason Rute (Jan 21 2020 at 22:49):

The official line of MIcrosoft Research does seem to be “wait until Lean 4”, and this makes sense. It is a waste of their time to support Lean 3. Nonetheless, there is a large and vibrant community using (and hacking) Lean 3, and it isn’t unreasonable to build something in Lean 3. If nothing else, it helps make clear what we need in Lean 4. From my investigation so far it is plausible that the lean server can be used to implement something like HOList for Lean 3. (And there would be no need to fork Lean.) IMO, after building a hacked together Lean 3 to DeepHOL interface (which lives safely in a Docker container), that would provide a template for a more flexible Lean-to-MLGym interface that would satisfy more researcher’s needs.

Simon Hudon (Jan 21 2020 at 22:55):

Speaking as someone who's both hacking on Lean 3.5 and Lean 4, I don't see using Lean 3 as a waste of time.

Simon Hudon (Jan 21 2020 at 22:57):

Porting code from Lean 3 to Lean 4 might not be completely straightforward but what you learned about Lean 3 should be transferable to Lean 4. Lean 4 is really an attempt at making all the goodies that the community loves in Lean 3 even better. If you discover a pain point (that others haven't discovered) you can already point it out and it could get better in Lean 4

Simon Hudon (Jan 21 2020 at 22:59):

(I say "it could get better" because it has to fit in the overall vision of Lean 4, we won't introduce ML-style modules for instance)

Minchao Wu (Jan 21 2020 at 22:59):

Interacting with the Lean server is basically how Rob and I implemented the interface between Lean 3 and Mathematica. Jesse and I had also discussed the possibility of doing this for ML at last year’s ITP. Talking to Lean server should be fine, the challenging part is parsing and managing proof states from what’s returned from Lean. For that we probably want some meta-programs to do it for us, and that’s why Jesse and I have also decided (for now) to wait until Lean 4 to see if there are new meta-level tools to use.

Jason Rute (Jan 25 2020 at 03:40):

@Bryan Gin-ge Chen I've been playing around with the lean server. The good news is that I can get it to behave in a back-and-forth manner where I feed it a goal and a tactic and it tells me if that tactic succeeds and what the goal state is. The bad news is that it is really slow. The only way I can reliably get it to work is to:

feed in my partial proof to a "sync" request
wait for it to send back an "all_messages" response
parse that response

This takes consistently about .1 seconds which is REALLY slow for this sort of simple round trip. I thought I can use the "info" request instead (which is much faster), but that is unreliable. The response to an info request is often referencing a previous version of the file before the sync request. Do you know of a faster way, or is this a fundamental limitation of the lean server that it takes .1 seconds to fully process a sync request.

Jason Rute (Jan 25 2020 at 03:46):

Also, a way to make the info request reliable is to use a new file for each call, but again if I do that it is still 0.1 seconds per call, so there is no speed up to use the "info" request.

Jason Rute (Jan 25 2020 at 03:56):

For Google research, I think 0.1 seconds would be fine (and I understand that some tactics would inherently take a while to execute), but it would make it hard for us mere mortals to do cool stuff.

Bryan Gin-ge Chen (Jan 25 2020 at 04:06):

Unfortunately I don't know of a faster way. Maybe @Gabriel Ebner might have ideas?

Jason Rute (Jan 25 2020 at 04:09):

And, either way, I'll play more with this and put some simple code on GitHub in the hopes that:

Someone with a lot of computing power (e.g. Google research) can make use of it
Someone who knows better can tell me a faster way
Those building Lean 4 know what the sort of things those in ML would like to be able to do.

Mario Carneiro (Jan 25 2020 at 04:41):

I think the more sustainable option is to hack lean to have an appropriate RPC or FFI interface

Jason Rute (Jan 25 2020 at 05:56):

I was worried someone would say that. I know Daniel Selsam has some old hacked Lean python bindings (which I don't think currently work). I don't know if those might be along the same lines. Anyway, I don't think I'm going to start mucking in the Lean C++ code soon, but someone else is welcome and encouraged to try.

Jason Rute (Jan 25 2020 at 21:11):

Ok, with batching (sending about 32 multiple goals/tactics to the same sync request) gives me about a 15x speedup. I don't know that I can do much better with the Lean server, but 7 ms does seem more reasonable than 100 ms.

Jason Rute (Jan 25 2020 at 21:13):

By comparison, HOList takes about 3ms, so it is comparable (but that is without batching).

Christian Szegedy (Jan 28 2020 at 00:47):

For Google research, I think 0.1 seconds would be fine (and I understand that some tactics would inherently take a while to execute), but it would make it hard for us mere mortals to do cool stuff.

Currently our timeout for proving is 5 seconds, because we make heavy use of the MESON tactic. We have a lot of other overhead, especially neural networks become the bottleneck if we go to sub-second range. We are perfectly fine with 0.1 second for now.

Stanislas Polu (Jan 28 2020 at 08:03):

Currently our timeout for proving is 5 seconds, because we make heavy use of the MESON tactic. We have a lot of other overhead, especially neural networks become the bottleneck if we go to sub-second range. We are perfectly fine with 0.1 second for now.

On that note, are these the default config on the docker image?

Jason Rute (Jan 28 2020 at 23:05):

Ok, for those interested, here is a prototype of communicating with Lean via the Lean Server from an external program. It looks like it is possible to implement a Lean version of HOList's ApplyTactic request. I think the next step would be to see if we can get a list of all Lean theorems and if we can plug them into this interface (specifically see if we can run the do-nothing skip tactic on the theorem goal). I'm not entirely happy with how this turned out. The Lean server is really not intended for this purpose and had to be fought with a lot, but I think it works now (if a bit slowly).

Jason Rute (Jan 28 2020 at 23:15):

Also, if anyone is interested in using the Lean server for another purpose (other than a text editor/IDE) I hope this would give them inspiration.

Jason Rute (Jan 28 2020 at 23:19):

@Jesse Michael Han I think you told me you can get a List of all theorems from the environment. How does one do this?

Scott Morrison (Jan 28 2020 at 23:20):

Don't have time to go into detail now, but look at the implementation of library_search, which does exactly this.

Minchao Wu (Jan 28 2020 at 23:27):

You can also find similar implementation in our Mathematica-Lean link repo

Minchao Wu (Jan 28 2020 at 23:27):

https://github.com/minchaowu/mm-lean/blob/master/src/dump.lean

Minchao Wu (Jan 28 2020 at 23:30):

By the way, we’ve done similar things (i.e., an interface with experimental RL results) to HOL4, which is to appear in this year’s AITP.

Jason Rute (Jan 28 2020 at 23:54):

Oh cool. I look forward to reading all the slides and abstracts from AITP this year.

Jason Rute (Jan 28 2020 at 23:56):

Also, @Minchao Wu, did you interface with Mathematica in a similar way to how I did it above? You used the lean server, right?

Minchao Wu (Jan 28 2020 at 23:59):

Yes, it works the same way as I can see from you repo, except that there is no python scripts involved. The Lean server subprocess is handled directly by Mathematica.

Minchao Wu (Jan 29 2020 at 00:01):

We also called the virtual file "dummy.lean" :-)

Jason Rute (Jan 29 2020 at 05:30):

Thank you @Minchao Wu ! Your code has been very helpful, but it also points out a number of subtle flaws in my current implementation. I was too quick to claim victory. :(

I don't handle universes, so if I try to put in the goal Π {α : Sort u} {a : α} {p : α → Sort v} {b : α}, a == b → p a → p b, Lean will complain about u and v. I could store the universes with the goal, and use a section to enter them, so that isn't a huge deal-breaker.
Lean's pretty printed goal output is not necessarily valid lean code. Example, this is the Lean pretty printed theorem, but ∀ (s : list char), string.mk_iterator {data := s} = {fst := list.nil char, snd := s} is not valid Lean. The problem is that in list.nil char, the char is implicit (the type of list.nil is Π {T : Type}, list T). I would have to know to enter it as @list.nil char or know that the char can be inferred. Similarly, and.symm = and.swap isn't valid Lean. It needs to be ∀ a b : Prop, @and.symm a b = @and.swap a b
Without knowing the theorems exact place, Lean can't fill in type classes. For example, if I try to enter ∀ (c : Prop) [_inst_1 : decidable c] (a b : ordering), ite c a b = ordering.gt = ite c (a = ordering.gt) (b = ordering.gt) I get complaints about "failed to synthesize type class instance [...]".
This isn't a problem with my app, but it should be noted that auto-generated equation theorems for definitions and type class instances get outputted in the dump. For example, ∀ {α : Type u₁} {β : Type u₂} (a : α) (b : β), combinator.K a b = a and ∀ {α : Type u}, list.has_append = {append := list.append α}. They aren't really something that is "provable" except using themselves. At least refl solves them.

Jesse Michael Han (Jan 29 2020 at 06:39):

Jesse Michael Han I think you told me you can get a List of all theorems from the environment. How does one do this?

@Jason Rute you can do something like this in Lean (the code in library_search is a bit more specialized, but essentially does the same thing):

meta def process_thm (d : declaration) : option declaration :=
let n := d.to_name in
  if ¬ d.is_trusted ∨ n.is_internal then none
  else match d with
       | declaration.defn _ _ _ _ _ _ := none
       | t@(declaration.thm n ns e te) := some t
       | declaration.cnst _ _ _ _ := none
       | declaration.ax _ _ _ := none
       end

meta def library_thms : tactic $ list declaration :=
  environment.decl_filter_map <$> get_env <*> return process_thm

meta def list_all_theorems : tactic unit :=
  do library_thms >>= λ x, tactic.trace (declaration.to_name <$> x)

run_cmd list_all_theorems

Patrick Massot (Jan 29 2020 at 08:12):

@Jason Rute Did you look at how format_lean and the lean crawler handle all those issues? leancrawler is entirely devoted to listing declarations and statistics about them. And format_lean does need tactic state information from python, implementing a very rough interface at https://github.com/leanprover-community/format_lean/blob/master/src/format_lean/server.py.

Minchao Wu (Jan 29 2020 at 11:23):

@Jason Rute The issues you mentioned are exactly some of the many reasons we prefer to delay the implementation until Lean 4. There could be even more issues with parsing and managing proof states, e.g., sometimes you might also see meta-variables occurring in the local context so that you can't directly send them back (as-is) to Lean as assumptions.
For toy examples, the hardcore parsing as you have done would work well, but for serious theorems we probably need a more robust (and fast) system. It seems to me that the ideal way of doing it is having a Lean meta-program that manages proof states for us, and return directly Python-readable objects containing the information we need. In that case, Python only needs to be responsible for sending proper requests and doing machine learning.

Minchao Wu (Jan 29 2020 at 11:26):

Also this is why we prefer systems that implement a simple type theory (at current stage). Too much information about types not only increases the amount of knowledge the agent needs to learn, but also complicates proof states management.

Jason Rute (Jan 29 2020 at 13:44):

@Minchao Wu When you say "the hardcore parsing", do you mean parsing the whole syntax tree of the expression? (I assume if we have that, we could reconstruct the expression in Lean, but even then I'm not sure if it has everything to unambiguously do that.)

Minchao Wu (Jan 29 2020 at 13:51):

I mean parsing the pretty-printed responses from the server on the Python side

Minchao Wu (Jan 29 2020 at 13:53):

...but even then I'm not sure if it has everything to unambiguously do that.)

Yes, hopefully Lean 4 would offer us more tools.

Jason Rute (Jan 29 2020 at 14:04):

What is it that Lean 4 will provide that Lean 3 doesn’t have? That is not clear to me.

Minchao Wu (Jan 29 2020 at 14:09):

I don't know, either. But there could be more well-documented meta constants that expose more internal stuff.

Jason Rute (Jan 29 2020 at 23:45):

@Minchao Wu More experiments are needed, but if I work with the Lean expressions themselves, then my above issues go away. It might still not cover all (or most?) cases, but I think it is worth exploring more...

Jason Rute (Jan 29 2020 at 23:50):

For example, the following code allows me to create a goal from an expression.

meta def make_goal_from_expression (e : expr) : tactic unit :=
do
   v <- tactic.mk_meta_var e,
   gs <- tactic.get_goals,
   tactic.set_goals $ v :: gs,
   t <- tactic.target,
   tactic.trace t,
   return ()

meta def my_expr : expr :=  ( expr.pi ( name.mk_string "c" name.anonymous ) binder_info.default  ( expr.sort level.zero ) ( expr.pi ( name.mk_string "_inst_1" name.anonymous ) binder_info.inst_implicit ( expr.app ( expr.const ( name.mk_string "decidable" name.anonymous ) list.nil ) ( expr.var 0 ) ) ( expr.pi ( name.mk_string "a" name.anonymous ) binder_info.default  ( expr.const ( name.mk_string "ordering" name.anonymous ) list.nil ) ( expr.pi ( name.mk_string "b" name.anonymous ) binder_info.default  ( expr.const ( name.mk_string "ordering" name.anonymous ) list.nil ) ( expr.app ( expr.app ( expr.app ( expr.const ( name.mk_string "eq" name.anonymous ) ( list.cons ( level.succ level.zero ) list.nil ) ) ( expr.sort level.zero ) ) ( expr.app ( expr.app ( expr.app ( expr.const ( name.mk_string "eq" name.anonymous ) ( list.cons ( level.succ level.zero ) list.nil ) ) ( expr.const ( name.mk_string "ordering" name.anonymous ) list.nil ) ) ( expr.app ( expr.app ( expr.app ( expr.app ( expr.app ( expr.const ( name.mk_string "ite" name.anonymous ) ( list.cons ( level.succ level.zero ) list.nil ) ) ( expr.var 3 ) ) ( expr.var 2 ) ) ( expr.const ( name.mk_string "ordering" name.anonymous ) list.nil ) ) ( expr.var 1 ) ) ( expr.var 0 ) ) ) ( expr.const ( name.mk_string "gt" ( name.mk_string "ordering" name.anonymous ) ) list.nil ) ) ) ( expr.app ( expr.app ( expr.app ( expr.app ( expr.app ( expr.const ( name.mk_string "ite" name.anonymous ) ( list.cons ( level.succ level.zero ) list.nil ) ) ( expr.var 3 ) ) ( expr.var 2 ) ) ( expr.sort level.zero ) ) ( expr.app ( expr.app ( expr.app ( expr.const ( name.mk_string "eq" name.anonymous ) ( list.cons ( level.succ level.zero ) list.nil ) ) ( expr.const ( name.mk_string "ordering" name.anonymous ) list.nil ) ) ( expr.var 1 ) ) ( expr.const ( name.mk_string "gt" ( name.mk_string "ordering" name.anonymous ) ) list.nil ) ) ) ( expr.app ( expr.app ( expr.app ( expr.const ( name.mk_string "eq" name.anonymous ) ( list.cons ( level.succ level.zero ) list.nil ) ) ( expr.const ( name.mk_string "ordering" name.anonymous ) list.nil ) ) ( expr.var 0 ) ) ( expr.const ( name.mk_string "gt" ( name.mk_string "ordering" name.anonymous ) ) list.nil ) ) ) ) ) ) ) )

run_cmd make_goal_from_expression my_expr

The trace is correctly:

∀ (c : Prop) [_inst_1 : decidable c] (a b : ordering), ite c a b = ordering.gt = ite c (a = ordering.gt) (b = ordering.gt)

Jason Rute (Jan 30 2020 at 23:53):

Right now I see ways to do what I am trying to do:

Use the Lean server and work directly with Lean syntax. This is flexible and easy to enter and play with stuff, but quickly runs into parsing problems.
Use the Lean server but work with plain text of Lean expressions for the goals. It is no longer human readable, but works more robustly. It is still possible to enter tactics as plain text, but that might have similar issues with robustness.
Use a lean program to enter serialized expressions. This is really fast and robust. The problem is that I don't think it is easy to enter the tactics in such an easy way. We would have to create a custom syntax for tactics, but this is what HOList does already, so it would be similar. This is my vote, but I think I'll build all three.

Jason Rute (Jan 30 2020 at 23:55):

It might not be clear what I'm talking about, but for concreteness, see my recent notebook for (1) and see this thread for (3). I'll try to prototype all three.

Stanislas Polu (Feb 03 2020 at 09:30):

@Jason Rute really interesting to follow you progress. (First wanted to point out that as soon as you have something ready, I'm eager to try it out). Can you elaborate a bit on the parsing problem you encounter with (1)? The speed does not seem too bad at all from what you report on your notebook. I was myself planning to fork format_lean. Are those parsing problems solved there? Why not start with it?

Jason Rute (Feb 03 2020 at 13:30):

@Stanislas Polu Thanks for the encouraging words. As the "parsing problems" I'm referring to, see above. However, the more I think about this, I made it sound worse than it is. One problem is universe levels. Maybe I just need to add those to my interface. The main problem is that when I "dump" the current theorems of Lean (as training examples), the representation I get can't always be put back into my interface. However, as I look at this more, if use the more verbose representation given by set_option pp.all true then that works much better. It is clear to me now that my next step is to dump all the Lean theorems (at least in the core Lean) to a file and test which ones I can enter into my interface successfully.

Jason Rute (Feb 03 2020 at 13:33):

I also have some more comments for you, but I have to run...

Stanislas Polu (Feb 03 2020 at 13:55):

That sounds super encouraging! I'm still quite busy integrating against holist but as soon as I'm out of the woods on this one I'll gladly land a hand here :+1:

Jason Rute (Feb 03 2020 at 14:43):

Ok, it sounds like you are busy, but since you mentioned you are eager to try it out, my notebook code is still very usable with a few caveats:

I need to add support for universes and other binder types.
I might need to switch to another Lean output format to make it more robust (so that you can more robustly plug the output of one call into the input of the next, avoiding the issues I was talking about in my previous message).
It is just a bare bones low-level inference to Lean. You would have to build your own higher-level interface for your purposes. (However, I did provide an example of breath first search prover as an example of the ideas involved.) In particular, it doesn't come with any of these features:
- A library of theorems (for either training examples or premise selection).
- A curated list of tactics or a clear way to specify/enter their parameters (except as plain text).

Nonetheless, you can do a lot with it right away. If you look at the breath first prover examples, this gives you a clear way to build a learned AI which works for minimal logic. The tactics apply _, cases _, intro, split, left, and right should be complete for minimal propositional logic. For intuitionistic propositional logic add the tactic exfalso. So you can train an intuitionistic logic AI for example. You can also look at Buzzard's natural number game for other examples of tactic systems which one could play with. It all depends what you are going for...

Jason Rute (Feb 03 2020 at 14:45):

I'll work on shoring up this prototype and I'm also working on a different prototype which directly enters the theorems to a Lean program which then parses them and runs the specified tactics. This has some advantages and disadvantages. I'm trying to compare and contrast both approaches. My hope is that after I make my prototypes, someone with more time can turn them into usable systems for their (and other's) needs.

Stanislas Polu (Feb 03 2020 at 15:00):

I might need to switch to another Lean output format to make it more robust (so that you can more robustly plug the output of one call into the input of the next, avoiding the issues I was talking about in my previous message).

That looks very promising! Can you cc me (@spolu) on this diff?

It is just a bare bones low-level inference to Lean. You would have to build your own higher-level interface for your purposes. (However, I did provide an example of breath first search prover as an example of the ideas involved.) In particular, it doesn't come with any of these features:
- library of theorems (for either training examples or premise selection).
- curated list of tactics or a clear way to specify/enter their parameters (except as plain text).

I've seen your BFS example which looks great. First thing I would do is try to get a dump of theorems and their proofs in a format compatible with your environment. That's probably a bit involved but obviously a crucial step to enable ML work beyond pure RL.

Jason Rute (Feb 03 2020 at 23:02):

I don't know that proof recording (at the tactic level) is going to be possible (or at least easy) in Lean 3. It should be possible to record proofs in terms of basic Lean inference rules (think "assembly language" for theorem proving), but not the high level tactics. (I guess anything is possible if one modifies the C++ code, but I have no idea where to start with this.) Instead, I'm going to record just the theorem statements. This would be enough to do reinforcement learning (and when I talked to Markus Rabe, it seems for Google's work, this is all that they need).

Jason Rute (Feb 03 2020 at 23:03):

Of course, if there is a way to record proofs, this is the place to ask. Does anyone know a way to record tactic-proofs?

Stanislas Polu (Feb 04 2020 at 09:54):

Isn't the interaction demonstrated in format_lean[0] enough to record proofs. For each statement we have the tactic state it is applied to and the tactic sate it maps to?
[0] https://leanprover-community.github.io/format_lean/example/sandwich.html

Stanislas Polu (Feb 04 2020 at 10:00):

Instead, I'm going to record just the theorem statements. This would be enough to do reinforcement learning (and when I talked to Markus Rabe, it seems for Google's work, this is all that they need).

This is a very specific (and therefore somewhat limited) approach to AITP, I think it would be an error to over-index on that statement. Yes RL on statements works for Hammer-ish / Premise-selection-only type of automated theorem proving, but that's not everything there is to automated theorem proving, especially in Lean.

Jason Rute (Feb 04 2020 at 12:19):

First, to be clear, I was ignoring proof recording because (1) I didn't think it was easy and (2) we can do a lot without it (how much remains to be seen, but I don't think one can say with certainty that reinforcement learning only works on some types of projects and not others). If it is indeed easy to get that data, then of course I want it! (Also, remember that Lean proofs are human written and Lean hasn't been around for that long. Right now I count about 8300 occurrences of the theorem keyword in mathlib. I'm not sure it that is enough training data or not.)

Jason Rute (Feb 04 2020 at 12:36):

Now as for getting the theorem data, so far I have been using the above mentioned approaches which are internal to Lean (i.e. looking at the Lean environment for all the declared theorems.) I hadn't considered doing it externally to Lean (i.e. using the Lean server to look at each line of a proof). This is very interesting and I'm curious how well it works. I think it would be fiddly, but it probably is the best option we have. As an implementation detail, recall that Lean proofs don't have to be linear, so one has to take into account nested structure. Also, recall that not all proofs are tactic proofs, some are term-mode proofs. (The format_lean approach might be a bit simplistic in that it seems that the human tells the formatter where to look by adding comments to the Lean file. We would need something that looks at the Lean file automatically, but I agree it is promising.)

Jason Rute (Feb 04 2020 at 12:37):

For now I am just going to focus on the bare essentials of getting the theorems and premises, but if you can get proof recording to work, I'm glad to help were I can.

Patrick Massot (Feb 04 2020 at 13:27):

Right now I count about 8300 occurrences of the theorem keyword in mathlib. I'm not sure it that is enough training data or not.)

Most statements in mathlib are introduced by the lemma keyword.

Patrick Massot (Feb 04 2020 at 13:28):

The format_lean approach might be a bit simplistic in that it seems that the human tells the formatter where to look by adding comments to the Lean file. We would need something that looks at the Lean file automatically, but I agree it is promising.

The formatter looks at the beginning and at the end of each line, without human intervention (except maybe that each proof should start with a comment, I don't remember the current status).

Patrick Massot (Feb 04 2020 at 13:30):

That being said, format_lean is openly a temporary hack. We are all waiting for the Lean 4 parser that will give us access to everything, including white spaces and comments. Writing a formatter should be pure joy and much more powerful.

Jason Rute (Feb 04 2020 at 14:42):

Most statements in mathlib are introduced by the lemma keyword.

Ok. I stand corrected. I count about 11600 occurrences of the lemma keyword.

Reid Barton (Feb 04 2020 at 16:45):

Of course, if there is a way to record proofs, this is the place to ask. Does anyone know a way to record tactic-proofs?

I guess it depends on exactly what you have in mind, since the set of tactics is extensible, and while the C++-Lean tactic interface is fixed, it's probably lower-level than you want. Also, mathlib-style proofs use a lot of exact and refine with terms (pre-expressions) of nontrivial complexity, maybe because we do not have much automation to deal with those parts of proofs. So the syntax you would need to model is more complicated.

Reid Barton (Feb 04 2020 at 16:46):

In any case, it is certainly possible to do something along these lines by replacing tactic.interactive or begin ... end by your own version. I think the natural number game does something of this sort.

Jason Rute (Feb 05 2020 at 01:20):

Ok, the more I think about it there is this spectrum of how to “record” proof data. At the most “human” level one can just download all the Lean files as they are. Of course this doesn’t contain any parsing information or intermediate goal states or any other information private to Lean, but this is the sort of messy data that data scientists and machine learning researchers are used to working with, and one can still learn a lot from this. On the other extreme is to modify the tactic framework in Lean’s C++ code to record tactics. It would contain very specific information about tactics and and subgoals, and hopefully let one replay a proof, but it might (???) be a lot of work to implement.

The intermediate level seems to be to go through all the proof files and inspect them with the Lean server. This seems like a good approach and is the one Stanislas is advocating for. It involves a bit of programming but not too much and one gets information private to Lean about goal states.

Even this has two extremes. Again on the human side, one can treat each line as a “tactic” and just run the Lean server “info” command at the start of each line. If it shows goals, then you know you are inside a tactic proof and you can record the goals and the “tactic” (i.e. the line), which might be many tactic commands in one line or the partial beginning of a tactic command spanning multiple lines. Also we wouldn’t be keeping track of the whole proof tree, but just what tactics to apply at each intermediate goal state. Another thing is that we will have no information about notation, implicits, or other things that Lean keeps track of. Even so, it would be very good training data. On the more “computer” side one could carefully find the start and end of each tactic command and keep track of the flow status of the proof including universes, term expressions, etc. (maybe using “holes commands”) so that we could replay the whole proof perfectly. This again seems like a lot of work.

Anyway, I’d be in favor of doing the happy medium where one runs the Lean server “info” command on every line of a file. (Then later one could start to modify this to maybe look at commas/semicolons or add some hole commands until the benefits start to outweigh the cost.)

Stanislas Polu (Feb 05 2020 at 08:57):

@Jason Rute this seems like the most amenable format to modern machine learning techniques. There's always a trade-off around the granularity of the training signal you get/use. As an example, from past experience, training a model at the kernel level (which would be one of the extreme you propose) is not very efficient because the data is too granular and the number of steps required to complete a proof too high wrt to how errors compound. At the opposite of it, as you mention as well, there's the lean code, but here the data is probably not granular enough and generating a full proof's code in one go is probably just too hard to be achievable at scale (also you give up on "search").

Plugging where the humans operate therefore appears as a sweet spot in that trade-off (which is kind of obvious). I wouldn't worry too much about the data not including all the internal state of Lean; humans don't have access to it either arguably. Also one can always condition its prediction on more data coming from what precedes in the lean file if need be.

All that to say +1 on your analysis :)

Jason Rute (Feb 05 2020 at 22:54):

To be clear, I don’t think anyone is talking about training at the kernel level (i.e. proof terms). The suggestions about modifying C++ are about training at the tactic framework level. I still think in the future, when the tools are available, this is the best approach. (Of course, even then one can still use the hand-readable pretty printed statements if that is indeed the best representation for training. One can just save this during the tactic-level proof recording.). However, until that is available, using the lean server to inspect the files seems the right happy medium.

Christian Szegedy (Feb 06 2020 at 03:29):

@Stanislas Polu

This is a very specific (and therefore somewhat limited) approach to AITP, I think it would be an error to over-index on that statement. Yes RL on statements works for Hammer-ish / Premise-selection-only type of automated theorem proving, but that's not everything there is to automated theorem proving, especially in Lean.

DeepHOL-zero does not just focus on lemma selection. We do full proof-search with tactics without imitation. I think we could easily do forward proving without imitation and I have hopes that we can do expression synthesis as well by trying to training on (truncated) sub-trees of theorems or intermediate goal steps.

Stanislas Polu (Feb 06 2020 at 10:22):

We do full proof-search with tactics without imitation

Yes totally. Sorry I wasn't implying that you were not producing full proofs, just noting that the proofs rely on TACTIC selection + premise selection which does not, as you rightfully mention, cover term generation.

Jason Rute (Feb 15 2020 at 16:11):

I improved my prototype notebook which uses the Lean server to make a theorem proving API: https://github.com/jasonrute/communicating-with-lean/blob/master/communicate_with_lean.ipynb The big changes are to the output format which makes it more robust and I've also added the ability to add universes and imports. I also added a section mentioning all the current issues I've seen so far.

Jason Rute (Feb 15 2020 at 20:13):

I'm now working on scrapping all of the Lean files. I should be able to get the following quite easily:

Every goal state which occurs inside a Lean tactic proof in its pretty printed form (and with a tiny amount more work, in its pp.all form).
The tactic command (including arguments) used on that goal state. (Some caveats here: if semicolons are used, that will fuse multiple tactic commands into one. With more work, I can separate them, but then it starts to get more complicated. Also, it might take some work to make sure I get a complete tactic command with arguments if that tactic command invokes other tactics. Parentheses matching is the first step.)
The doc string for each tactic used. (I don't think I have to manually identify which keywords are tactics verse theorems, etc. which is nice.)
Meta information about theorems and terms used as tactic arguments.

This will be a lot of really good information. Cross your fingers that this works...

Stanislas Polu (Feb 16 2020 at 08:59):

This is super exciting. Do you already have a plan towards reconstructing proofs interactively?

Jason Rute (Feb 16 2020 at 11:41):

@Stanislas Polu What do you mean by reconstructing proofs interactively (especially the interactively part)?

Stanislas Polu (Feb 16 2020 at 13:09):

@Jason Rute the "ApplyTactic" API

Jason Rute (Feb 17 2020 at 22:57):

Sorry for the late reply @Stanislas Polu. I certainly have "ApplyTactic for Lean" in mind. I've never had the plan to single-handedly make a system which is at the level needed for HOList level machine learning, but I am trying to prototype all the necessary parts so it is easy for others to make the final system. If in the end I create a fully-functional system, then great! To recap, the notebook at https://github.com/jasonrute/communicating-with-lean/blob/master/communicate_with_lean.ipynb is a prototype for a fairly usable "state free" API for Lean. It however is nowhere near where HOList is. Here are next steps:

See how well we can enter goals without the system breaking. There are a couple of sources of goals: top level declared theorems and intermediate goals used in actual proofs. I'm experimenting with entering both in and seeing how it goes.
Come up with a library of tactics. Unless one wants the user to enter free-form text into the state-free API, we need to come up with a list of the most common tactics, the most common ways to use them, and an API for them. This is by far the most tedious process. I'm scrapping all the Lean files and this should give a lot of good data on stuff like this.
Figure out how to enter premises, local context variables, and other things. Will we give the user a method to enter premises (and definitions?) or will we be more restrictive and only let the user use theorems already in the environment.
Make sure the tactics aren't going to cheat and use the theorem to prove the theorem. Lean's simp tactic isn't really a pure function (its behavior changes as theorems are proved). This means if we are not careful, simp could solve a goal because that goal was added to simp's list of usable theorems. I guess a simple step would be to remove all simp-provable (and probably refl-provable) theorems from the list of testing examples. Another option is to not use simp, but the more restrictive simp only.

Anyway, I'm slowly working on this in my spare time. I will also have to take a break for a number of weeks soon. (Good personal life event coming up!)

Jason Rute (Feb 17 2020 at 23:06):

Also, in a day, I'll be done scrapping the info command on all the Lean files. If you want access to the data (it's pretty raw!), let me know. Besides using it for simple analysis, statistics, and debugging, it would be a perfectly good machine learning data set: From a goal (pretty-printed with pp.all set to true), predict the next tactic used. (One can try to predict the tactic parameters too, but for a number of reasons this is still really messy.)

Stanislas Polu (Feb 21 2020 at 23:18):

Definitely interested in getting access to the data :+1:

Jason Rute (Feb 23 2020 at 00:43):

The extraction script, including a decent amount of documentation, is here: https://github.com/jasonrute/lean_info_scrapper. If you want the raw data, let me know your gmail account in a private message, and I can share the data via Google Drive (this seems easiest). It is less than 100 MB. I'm working on cleaning the data up to be more useful.

Stanislas Polu (Feb 23 2020 at 02:40):

Fantastic! I will go through the exercise of running it :+1:

Brando Miranda (May 11 2020 at 14:55):

I need to read this thread in more detail but from what I read, I am curious,
1) is it worth while developing any RL Environment for any ITP given that HOList exists (or that there seems to be an interest to integrate them to HOList)?
2) What are the features if HOList as a system itself that are (seem) to be superior to standard RL Gym's?
3) What are the advantages of integrating Lean (or any ITP) to HOList?
4) How easy is it to integrate any of these systems to HOList?

Jalex Stark (May 11 2020 at 22:42):

Brando I don't understand your question 1. My understanding is that HOList currently can only help you prove theorems in Isabelle/HOL. A reason to build learners that are not HOList is so that you can build one that is compatible with Lean, since you might like to write Lean code and have a computer help.

Brando Miranda (May 11 2020 at 22:53):

@Jalex Stark So is it only possible to incorporate ITPs that are based on higher order logic HOL into HOList? (i.e. is it trivial to incorporate Isabelle?). I was under the impression the environment for HOList only supported HOL Light. I am trying to understand when it makes sense to extend HOList and it's advantages & disadvantages at supporting other ITPs (and when it's worth building a new ITP env from scratch).

Jalex Stark (May 11 2020 at 22:59):

Oh I don't actually make any claim that HOList supports Isabelle, I just have always seen references to Isabelle/HOL and never figured out what it is.

Jalex Stark (May 11 2020 at 22:59):

I have no idea what it would mean to "extend" HOList to support Lean

Jalex Stark (May 11 2020 at 23:00):

My best guess at what one should do is write a thing that interacts with Lean "from scratch" by following the design of HOList closely

Jalex Stark (May 11 2020 at 23:00):

and yeah you might be able to re-use a lot of the code

Jalex Stark (May 11 2020 at 23:00):

but in the early days you shouldn't expect to usefully share weights between models trained on HOL examples and models trained on Isabelle examples

Jalex Stark (May 11 2020 at 23:02):

in order for one model to be effective in both domains, you'd effectively be requiring that the model has its own "internal type theory" that it can translate between it and HOL and Lean

Jason Rute (May 12 2020 at 00:09):

First, let's address the narrow question about using HOList in Lean. As @Jalex Stark mentioned, HOList (and DeepHOL, the AI associated with the HOList environment) are designed for HOL-Light. For sake of argument let's consider the different ways that Lean (or another tactic-based ITP) could "use HOList".

Most directly Lean could translate a Lean goal into a HOL. This already isn't always possible (for logical differences between HOL and DTT), but I think something similar is done in practice for Hammer systems in Coq (but I'm really not an expert here). Then one can send that to a pertained version of DeepHOL (there are many but for sake of argument let's choose the public version). However, DeepHOL and HOList actually have the vocabulary of HOL-Light hard coded (in the same way that many NLP models have all the English words hard coded as indices for each word). This will require that not only that we translate the goal into HOL, but HOL with a similar vocabulary to HOL-Light. While some concepts like the natural numbers are easy to align, it is incredibly difficult to translate between all of the concepts even in the same logic. There are academic papers written on this subject of concept aligning between ITPs. So we would have to align some of the most common concepts (like natural numbers, logical symbols, etc) and just except that we can't align all of the vocabulary. That however, may not be a complete bust. HOList is capable of expanding to another vocabulary, but one just has to mask the new definitions. Surprisingly HOL-Light still does ok in this setting, e.g. on Flyspeck trained on the core HOL-LIght library. Ok, so taking stock, we have translated our Lean goal into an HOL-Light goal with HOL-Light syntax. However, DeepHOL also needs premises to select. We would have to translate Lean theorems into HOL-Light theorems and supply those as premises. Then after all that, DeepHOL will find an HOL-Light proof. This will need to be translated into a Lean proof, which is not straightforward either since HOL-Light has some really powerful automation (e.g. MESON_TAC).
Slightly less directly, we could use HOList/DeepHOL as in the previous bullet, but only one tactic at a time. Then one would have to build a tree search in Lean and an environment to try tactics. (We aren't using it for RL yet, but it has a similar purpose of being an RL Gym.) This would have a better chance of guiding us to the correct proof since we can correct the proof translation as we go.
Now, there are a number of places we can go from here. For example, we can use the DeepHOL architecture but with a "LeanGym". Indeed the Google team has expressed interest in this. Another option is to just copy the concepts in the HOList paper. Again we need a LeanGym and maybe prerecorded proof data.
In the end, I think unless Google wants to support DeepHOL for Lean (retrained from scratch), the best we can directly exact from the DeepHOL algorithm is the graph embeddings. As you know @Brando Miranda, all the state of the art tools in image processing and natural language processing use pertained networks (e.g. Resnet trained on Imagenet, BERT trained on a large corpus of English, etc). There isn't anything similar yet for theorem proving. But all of Szegedy's recent works (and the talks for this years AITP) strongly suggest that his team is working on this exact problem. In about a year or two, I bet we can have some standard graph embeddings that could work for HOL-Light, Lean, Coq, Isabelle, etc. I still think we are going to have the vocabulary and logic alignment issue, but maybe the embeddings will work solely on the structure of the formula graph.

Jason Rute (May 12 2020 at 00:34):

I think I misunderstood slightly. I think you are talking about incorporating Lean (and maybe Coq, Isabelle, etc) into HOList. Here are my thoughts on that. (Just a point of disambiguation here: I'm now using HOList to only mean the HOL-Light interface, not DeepHOL.)

HOList isn't much more than a thin wrapper around HOL-Light. I've documented it already. Honestly, to "incorporate Lean" into HOList basically means writing a completely separate wrapper around Lean, but with the same protobuf API. It would reuse none of the current HOList code (which is just a fork of the HOL-Light directory). I'm already working on making a prototype of such an interface. It is not for Google directly, but it could be used by Google (and it wouldn't be more than a week's work to write a protobuf wrapper around it).
The HOList interface is sort of specific to HOL-Light. Dependent type theory for example has a notion of a local context which I don't think HOList has any sense of. This would require an extension of the API. (And of course the tactics are different.)
I am 100% on board having some standardized theorem-proving gyms for the common ITPs. Just as the Atari gym allowed a researcher to try a single algorithm on many Atari games, it would be great to make it easy to apply the same algorithm to many ITPs. Also, it would make transfer of knowledge between separate logics an important research topic.
However, I don't think this should be solely under the HOList umbrella. Google is doing AMAZING stuff and they are far ahead of anyone else, but they also don't have any interest in documenting their interface. I've seen many good AI researchers (most of them here on Zulip) attempt to work with HOList, only to get frustrated by the difficulty and give up. Even you @Brando Miranda pointed out that CoqGym is the only documented API right now. Also, no-one so far has written a paper using the HOList interface apart from Google.
There is something that almost none of the current ML environments do well. None of them serve both as an RL Gym for training an algorithm and a tool for ITP researchers. DeepHOL is unusable as a tool even in HOL-Light. The TacticToe frameworks are designed to be usable for Coq and HOL4, but at the cost of being trained in OCaml and SML. :( This greatly constrains the ability to get ML researchers involved and to rapidly prototype ideas. (Good luck using TensorFlow or PyTorch.) My vision is a system which is both: a gym usable by ML researchers, and a tool which can be used directly in Lean. That is what I'm trying to build (or at least prototype) and I think Lean has the tools for this. (Lean 4 will even be better!)

Jalex Stark (May 12 2020 at 00:39):

https://github.com/LaurentMazare/ocaml-torch
obviously using pytorch in ocaml is harder than using it in python, but I think Laurent has done the hardest bits of work

Jason Rute (May 12 2020 at 00:43):

Ok, I might have gone a bit too far, but I think avoiding Python for experimentation will alienate ML researchers and make prototyping harder. If however, this is not the case, then by all means... (Also, even if the experimentation is done in Python, there is obviously value in incorporating the final ML pipeline into a "production" OCaml environment.)

Jason Rute (May 12 2020 at 00:46):

@Brando Miranda, I don't understand your question (2). HOList addresses a complicated and interesting problem that the standard RL domains don't. I don't however want to claim there are no other RL environments that don't share some of the difficulties of theorem proving. (Optimization, automatic code generation, and to some degree robotics also offer model-based RL opportunities with sparse rewards.) However, mathematical reasoning is intrinsically valuable in its own right. In analogy, NLP and Image processing are studied both as a way to improve AI, but also because there are millions of dollars of business interests in those areas as well as hope (probably false) that it will improve society. While mathematical reasoning isn't a lucrative, I think many see it as a holy grail of AI reasoning. (I think no one is saying that Szegedy's team is not ambitious enough. I usually hear the opposite.)

Jason Rute (May 12 2020 at 00:53):

Also, I should point out that, @Brando Miranda , you come at this from the point of view of an ML researcher looking for interesting problems to work on. Many here in this community just want to advance and improve theorem proving. I'd like to see systems like DeepHOL, TacticToe, etc be easy to work with for the typical proof engineer. Since Lean is so popular, it makes sense to build tools to at least try to start adding powerful AI to Lean. (For example, in another thread here, someone asked about ranking the possible rewrite theorems. They aren't looking for SOTA graph embeddings, just some tool that works ok. Right now, even that is hard to do in Lean. I'd like to make that situation better.)

Jason Rute (May 12 2020 at 12:43):

(Another point before I head into work.) HOList is implemented as a "stateless API", meaning that after all the definitions and theorems are loaded, you enter in each goal state and the tactic to apply. This is good for tree searching. However, I think it is much easier to do in HOL-Light than many other ITPs, including Lean. For Lean, I tried to make this work, but ran into multiple problems. Some of them might be fixed in Lean4, but I don't know about all of them. Instead, I've decided to go a different route where Lean keeps track of all the visited goal states in a proof for you and gives you an index for each. You can revisit any previously visited state by referencing that index. (Another approach could be to use the partially built tactic script as the "state". One can keep track of the proof scripts and use them to jump back to a previous state, but I was worried about performance here and having to re-run the tactic script every time if my tree search bounces around too much. I wonder if this is how CoqGym does it. Do you know @Brando Miranda?)

Jeremy Avigad (May 22 2020 at 19:17):

A colleague of mine just pointed me to this:
https://www.quantamagazine.org/symbolic-mathematics-finally-yields-to-neural-networks-20200520/

Patrick Massot (May 22 2020 at 19:31):

This is the same dubious work that was discussed here a long time ago, right?

Jalex Stark (May 22 2020 at 19:31):

yeah, the preprint has been out for a while

Jalex Stark (May 22 2020 at 19:31):

it "just" guesses antiderivatives for a small class of functions, using neural nets and NLP architecture to power the guessing

Patrick Massot (May 22 2020 at 19:32):

If I remember correctly the paper claims were borderline so I guess the journalist version goes way over the top

Jalex Stark (May 22 2020 at 19:32):

yeah the folks at quanta are good writers but i don't have any understanding on their topic selection

Jalex Stark (May 22 2020 at 19:34):

one time they wrote an article about an unsurprising incremental improvement to a lower bound in incidence geometry; they did a lovely job explaining the problem, but way overstated the contribution of the paper they were responding to

Jalex Stark (May 22 2020 at 19:34):

I think they're really excited by the phrase "grad student does X"

Jeremy Avigad (May 22 2020 at 19:35):

Thanks for the assessment -- it's save the time of looking into it.

Jason Rute (May 23 2020 at 13:31):

This topic has been discussed on the deep learning for symbolic mathematics topic before. For what it's worth, I think it overall is a good paper with interesting and thought provoking results. I don't think it is nearly as dubious as Patrick or Jalex suggest. (I could go into more detail, but probably in that thread or a new one.)

Jalex Stark (May 23 2020 at 14:51):

Sorry, I didn't mean to be too dismissive. The paper does what it says it does, it guesses antiderivatives with a neural net, using a training loop that involves a differentiation algorithm but not an antidifferentiation algorithm. You could possibly apply the same idea to other inverse problems.

Jalex Stark (May 23 2020 at 14:52):

I just meant to counterbalance the quanta article, which is too much.

Christian Szegedy (Jun 26 2020 at 01:39):

Jalex Stark said:

I just meant to counterbalance the quanta article, which is too much.

For people like myself, who work with neural networks a lot, this seemed like a highly surprising results as the network can predict relatively complex antiderivatives straight out of box. I think it is a great paper.

Stanislas Polu (Aug 21 2020 at 20:39):

@Jason Rute I’m looking for the html file that visualize the output of the scrapper; but can’t find it anymore. Mind sharing the link here again? Thanks thanks!

Stanislas Polu (Aug 22 2020 at 07:06):

Found it!

Jason Rute (Aug 24 2020 at 11:18):

@Stanislas Polu Sorry. I’ve been preoccupied. Do you have everything you need?

Stanislas Polu (Aug 24 2020 at 11:58):

Yes! All good :+1: Thanks!

Last updated: May 02 2025 at 03:31 UTC