Zulip Chat Archive

Stream: IMO-grand-challenge

Topic: general discussion


view this post on Zulip Miroslav Olšák (Oct 02 2019 at 14:07):

  1. Problems of the "find / determine" type
    I have collected the answers of the problems of this type from all the shortlists available on the official IMO website for a general idea how the answers can look like.
    http://www.olsak.net/mirek/determine-answers.txt

view this post on Zulip Miroslav Olšák (Oct 02 2019 at 14:07):

  1. Other domains.
    It is well known that IMO problems are of four categories: "Geometry / Algebra / Number Theory / Combinatorics" (a friend of mine once came with a nice comparison to "Imagination / Computation / Knowledge / Thinking" :-) ). The problem statements are relatively monotonous inside a single domain unless there is a combinatorial flavour. The most difficult problems from both formalisation perspective and the problem solving perspective are imho problems from Combinatorial Geometry. I am for example curious about the formalisation of the windmill problem (2011-2, https://www.youtube.com/watch?v=M64HUIJFTZM).

view this post on Zulip Miroslav Olšák (Oct 02 2019 at 14:08):

  1. Partial points
    Some of the IMO problems are actually multiple (two) independent tasks, for example 2016-6 (= C7). Whether to allow partial points in such cases is worth consideration.

view this post on Zulip Daniel Selsam (Oct 03 2019 at 13:22):

I am for example curious about the formalisation of the windmill problem (2011-2, https://www.youtube.com/watch?v=M64HUIJFTZM).

@Miroslav Olšák For the windmill problem, are you concerned about formalizing the statement, finding a solution, or formalizing a solution?

view this post on Zulip Daniel Selsam (Oct 03 2019 at 13:27):

Are we interested in formalizing olympiad-like mathematical puzzles not necessarily coming from IMO?

@Miroslav Olšák There are many sources of problems that I think would provide valuable data for IMO, e.g. problems from national Olympiads. It is hard to say without knowing more whether your folklore list is worth the trouble of you translating it to English.

view this post on Zulip Daniel Selsam (Oct 03 2019 at 13:34):

  1. Partial points
    Some of the IMO problems are actually multiple (two) independent tasks, for example 2016-6 (= C7). Whether to allow partial points in such cases is worth consideration.

@Miroslav Olšák I agree. Unfortunately, I don't see any official statement concerning the number of points for solving sub-problems of problems. One option is to say "partial credit will be given for fully formalized sub-problems according to the number of points human judges would have awarded for the same sub-problems". What do you think?

view this post on Zulip Daniel Selsam (Oct 03 2019 at 13:37):

Because of that, I am now focused mainly on geometry, and I have translated the officialy available shortlists to a semi-formal language (parseable but without detailed semantics and not in any particular therem prover so far). So I could help a bit with this part.

@Miroslav Olšák Nice! Can you share one example to give us a sense of the semi-formal language?

view this post on Zulip Daniel Selsam (Oct 03 2019 at 13:41):

  1. Problems of the "find / determine" type
    I have collected the answers of the problems of this type from all the shortlists available on the official IMO website for a general idea how the answers can look like.
    http://atrey.karlin.mff.cuni.cz/~mirecek/determine-answers.txt

@Miroslav Olšák Nice! What do you think about the current plan, of requiring human-assessment of the witnesses and only accepting witnesses that human judges would have accepted? See https://github.com/IMO-grand-challenge/formal-encoding/blob/master/design/determine.lean for more context.

view this post on Zulip Miroslav Olšák (Oct 03 2019 at 13:42):

Miroslav Olšák For the windmill problem, are you concerned about formalizing the statement, finding a solution, or formalizing a solution?

I am not concerned that formalizing the problem statement, or formalizing the solution would be impossible, I just find it somewhat challenging, so I would like to see it. Of course, if an automated system (not designed for solving this particular problem) could find a solution of it, I would be super-impressed. However, I feel the following issue of the formalization of the problem statement. There are some facts about the problem that are so obvious (but really nontrivial to formally proof) that it is unclear whether they are actually a part of the problem statement, or a part of the solution. I mean for example "given any initial line, there is exactly one windmill process".

view this post on Zulip Miroslav Olšák (Oct 03 2019 at 13:48):

Miroslav Olšák I agree. Unfortunately, I don't see any official statement concerning the number of points for solving sub-problems of problems. One option is to say "partial credit will be given for fully formalized sub-problems according to the number of points human judges would have awarded for the same sub-problems". What do you think?

It makes sense. Note that although there is no public document for the IMO marking scheme, the judges prepare it and agree on the marking scheme before checking the solutions.

view this post on Zulip Daniel Selsam (Oct 03 2019 at 14:14):

However, I feel the following issue of the formalization of the problem statement. There are some facts about the problem that are so obvious (but really nontrivial to formally proof) that it is unclear whether they are actually a part of the problem statement, or a part of the solution. I mean for example "given any initial line, there is exactly one windmill process".

@Miroslav Olšák I agree that these are big challenges. One approach might be to have a DSL for geometric processes, and then to encode the windmill process as a program in the DSL.

windmill(S, s, l) =
  while true:
    l \gets rotateUntil(l, s, CLOCK_WISE, fun l => exists s' \in S, s \neq s' /\ on(s', l))
    s \gets choose({s' \in S : s' \neq s /\ on(s', l)})

Here it would be provable that the {s' \in S : ...} set always has exactly one element (since by assumption |S| > 1 and no three points are collinear) and thus the process is deterministic. The key insight required to solve the problem could then be cast as discovering an invariant of the program.

I am not sure what kind of semantics such a hypothetical DSL would warrant, probably operational, and bottoming out into some geometric object parameterized by time. I am also not sure whether we would have already had such abstractions before the windmill year, or whether the abstractions we can build now will be good enough for future problems.

view this post on Zulip Daniel Selsam (Oct 03 2019 at 14:23):

As far as I know, the computational methods (Wu's method / Gröbner basis / ...) are stronger than any synthetic approach, and I have heard that they are capable of solving at least some of the IMO problems
...
Note that I also have a parser for that, so I can tell the types of the objects, possibly convert it to other format, etc.

@Miroslav Olšák I am very curious which existing problems can be solved by which existing tools. What do you think are the most relevant off-the-shelf tools to try?

view this post on Zulip Miroslav Olšák (Oct 03 2019 at 14:27):

Miroslav Olšák Nice! What do you think about the current plan, of requiring human-assessment of the witnesses and only accepting witnesses that human judges would have accepted? See https://github.com/IMO-grand-challenge/formal-encoding/blob/master/design/determine.lean for more context.

Well, putting a human into the loop is fine, it just rather postpones the problem than solves it (but it may be a good thing to postpone it). I also like the idea of whitelist of allowed operations for every individual problem where the problems available so far would help us prepare templates for such whitelists. But if a problem requiring something more complex emerged, we could simply modify the whitelist to allow what is necessary without providing much hints (it is actually also putting a human to the loop but to another place).

view this post on Zulip Kevin Buzzard (Oct 03 2019 at 16:10):

This looks interesting : https://mathoverflow.net/a/337705 . I don't know if it's good enough to solve IMO problems though

view this post on Zulip Patrick Massot (Oct 03 2019 at 20:21):

Maybe this conversation should move to another Zulip thread. That one is meant to be used by the Zulip AI only. More seriously, it would make it easier to find back this conversation. Miroslav, I think you can to the move by editing your first message in this thread.

view this post on Zulip Reid Barton (Oct 03 2019 at 20:22):

(Miroslav, Patrick is referring to the topic "stream events" in case it's not clear)

view this post on Zulip Miroslav Olšák (Oct 03 2019 at 20:22):

Oh, yes, sorry, I am not so familiar with Zulip.

view this post on Zulip Miroslav Olšák (Oct 03 2019 at 20:24):

Now it is "hmble", any suggestion for a better name? (why does it actually require a name, I just wanted to contribute to the general discussion)

view this post on Zulip Johan Commelin (Oct 03 2019 at 20:25):

Well, then call it "general discussion"

view this post on Zulip Daniel Selsam (Oct 03 2019 at 20:30):

I don't know, perhaps we should ask people from the community aroung automated deduction in geometry.

@Miroslav Olšák I am most curious about which (non-synthetic) decision procedures work for which existing problems, e.g. by considering them as nonlinear real arithmetic (NRA) problems.

view this post on Zulip Joe Hendrix (Oct 04 2019 at 20:40):

(deleted)

view this post on Zulip Patrick Massot (Oct 04 2019 at 20:51):

Well, then call it "general discussion"

Maybe I'm not reading carefully enough, but I was under the impression there was a clear geometry thread. If this is not specific enough then it means we give up using Zulip threads.

view this post on Zulip Miroslav Olšák (Oct 04 2019 at 21:21):

Well, then call it "general discussion"

Maybe I'm not reading carefully enough, but I was under the impression there was a clear geometry thread. If this is not specific enough then it means we give up using Zulip threads.

I had several remarks, only one of them is related to geometry (and I don't consider the Windmill problem related to geometry, it is rather combinatorics).

view this post on Zulip Daniel Selsam (Oct 04 2019 at 21:23):

I had several remarks, only one of them is related to geometry

@Miroslav Olšák FYI the recommended style is to use different topics for each question/comment in a batch.

view this post on Zulip Miroslav Olšák (Oct 04 2019 at 21:40):

By the way, the comments teaching me how to use Zulip look irrelevant from the general perspective. I suggest using rather private messages next time. By the way, can I delete at least my comments of regarding Zulip?

view this post on Zulip Patrick Massot (Oct 05 2019 at 09:31):

@Miroslav Olšák I'm very sorry my message may have sounded a bit aggressive. I'm sure Daniel and the whole grand-challenge team is very happy to read your contributions. Experience on this forum suggests things are a bit easier if we somehow try to separate topics, but there are plenty of counterexamples. So please don't let that issue prevent you from contributing.

view this post on Zulip Brando Miranda (Oct 23 2019 at 19:52):

@Daniel Selsam I am trying to understand the specifications we are trying to pin down in this community. Inspired by a question on the Intermediate Langauge stream referencing HOList and worrying about portability of the competition if things get tied down to lean, is the goal of the project also to re-implement something like HOList? How is it going to be different?

From the comment on that thread/stream it seems that re-implementing HOList would be a pain (I wish I understood why), but I think it would be important to understand the difference and planning things out before going out and re-implementing a complicated system like HOList. What are your thoughts?

view this post on Zulip Daniel Selsam (Oct 23 2019 at 22:53):

From the comment on that thread/stream it seems that re-implementing HOList would be a pain (I wish I understood why), but I think it would be important to understand the difference and planning things out before going out and re-implementing a complicated system like HOList. What are your thoughts?

It is not hard to interface with ML systems. Lean has a tactic framework, with excellent meta-programming support, and also a foreign function interface.

view this post on Zulip Jason Rute (Oct 24 2019 at 00:06):

it seems that re-implementing HOList would be a pain (I wish I understood why)

@Brando Miranda I can try to address (narrowly) what I think would need to happen to reimplement HOList in Lean. I know at least one person here is working on it. I don’t know what progress they have made. I’m probably the one who said this a “pain”. I should probably backtrack and say it is doable with a good amount of engineering work, and I hope someone builds it! As I think you have an ML background, I won’t try to cover up the ML terminology.

view this post on Zulip Jason Rute (Oct 24 2019 at 00:06):

As I see it, the HOList projects have the following parts that would need to be reimplemented:

  1. A list of theorems For both training and testing, one needs a list of theorems (and the full context in some nicely parsable form) to train on.
  2. Proof recording (optional) If one wants to do supervised learning, then one needs a list of proofs as well to train on. These proofs will contain the theorem to prove (with the context) as well as the various tactics which have been applied along with their arguments. This needs to be at some intermediate level which records the name of the tactic and the arguments (so at a higher level than type theory), but probably not at the level of the raw lean code. I’ve heard from some in the Lean community that the tactic environment could be hacked to provide this information, but I don’t know that it has ever been done. HOL Light has some advantages here. It has a simpler tactic framework (I think), it is a larger library (more training data), it is written by one person mostly (so is more uniform), and HOL Light only uses tactics (whereas Lean uses a mixture of tactics and the type theoretic framework). However, the ASTactic (CogGym) and ProverBot9001 projects also used proof recording for Coq.
  3. An interactive environment If one wants to do reinforcement learning and/or tree search, one needs to be able to quickly interact with the system. For tree search, given a particular state, one needs to be able to try possible tactics, see what the results are and back track if needed (using for example beam search). Also, for reinforcement learning, one needs to be able to try out a very large number of scenarios (in this case theorems, either real or synthetic, to prove). This necessitates an even faster back-and-forth between the agent and the system. Google rewrote HOL Light in C++ for this purpose. (The various Coq ML projects don’t use reinforcement learning.)
  4. A system for scoring tactics and tactic arguments Scoring the tactics can be done as a probability distribution over the tactics (computed by a neural network), but scoring the arguments to these tactics can be a bit more tricky because of the large number of possibilities. HOList has one system for doing this. The two Coq projects have another system. I don’t know if either is readily adaptable to Lean.
  5. Access to neural networks and computer power for training and evaluation The agent will have to compute tactic and argument scores via (graph?) neural networks. Therefore, it needs access to TensorFlow or PyTorch and a distributed computing system.

view this post on Zulip Jason Rute (Oct 24 2019 at 00:06):

Some further comments. One doesn’t need proof recording. Instead one can train solely with theorem statements and reinforcement learning. Conversely, if one doesn’t use reinforcement learning, then one doesn’t need as much speed in the interactive environment. Also, the tree search agent could live inside Lean (as a tactic) making FFI calls to TensorFlow, say. Alternately it could have the agent in Python or C++. Then it would have to guide Lean from the outside. I don’t know which is better.

view this post on Zulip Brando Miranda (Oct 24 2019 at 01:11):

I know at least one person here is working on it.

Awesome! Do you think its possible to get me in touch with them or their team? thanks!

view this post on Zulip Brando Miranda (Oct 24 2019 at 01:16):

Google rewrote HOL Light in C++ for this purpose. (The various Coq ML projects don’t use reinforcement learning.)

Are you saying google wrote HOL Light (the entire Theorm prover, idk if that is a lot of work or not but it sounds like it) only so that they could do RL on HOL Light? (trying to repeat it back to you to make sure I got it).

On a very related note, does that mean for someone to re-implement HOList to make LeanList we would need to re-implement Lean in a language that allows for high performance/speed to do RL?

I think the fundamental thing I don't understand is how to do the IMO-grand-challenge without a system like HOList built already. Why wouldn't that need to be a pre-requisite?

view this post on Zulip Mario Carneiro (Oct 24 2019 at 01:16):

Lean is implemented in C++ already

view this post on Zulip Mario Carneiro (Oct 24 2019 at 01:17):

and high performance has always been an objective

view this post on Zulip Brando Miranda (Oct 24 2019 at 01:17):

Lean is implemented in C++ already

So its fast enough to do Reinforcement Learning (RL) on it already? Is that what your saying?

view this post on Zulip Mario Carneiro (Oct 24 2019 at 01:18):

I have no idea what specifically is required for that, but FFI should be sufficient

view this post on Zulip Brando Miranda (Oct 24 2019 at 01:18):

I have no idea what specifically is required for that, but FFI should be sufficient

what does FFI mean?

view this post on Zulip Mario Carneiro (Oct 24 2019 at 01:18):

foreign function interface, i.e. calling functions in other languages

view this post on Zulip Mario Carneiro (Oct 24 2019 at 01:19):

I am also curious about "Google rewrote HOL Light"

view this post on Zulip Brando Miranda (Oct 24 2019 at 01:20):

Im curious, for Foreign Function Interface (FFI), is Lean's faster than Coq?

view this post on Zulip Mario Carneiro (Oct 24 2019 at 01:21):

Lean 3 got an FFI only in the community version, and I haven't used Lean 4's

view this post on Zulip Mario Carneiro (Oct 24 2019 at 01:22):

I don't see any reason why FFI should be very slow

view this post on Zulip Reid Barton (Oct 24 2019 at 01:22):

I'm sure Lean 4's will be fast.

view this post on Zulip Mario Carneiro (Oct 24 2019 at 01:22):

I guess marshaling of large objects might be a performance penalty

view this post on Zulip Brando Miranda (Oct 24 2019 at 01:23):

Is an Foreign Function Interface (FFI), bi-directional? or is it only powerful from within Lean to say Python? What about the reverse?

view this post on Zulip Mario Carneiro (Oct 24 2019 at 01:24):

The only way I am aware of for other languages to talk to lean is through the server mode, which uses JSON for message passing

view this post on Zulip Mario Carneiro (Oct 24 2019 at 01:24):

I guess you could also try literally linking with lean as a library, but I've never seen that done and I have no idea if it's doable

view this post on Zulip Brando Miranda (Oct 24 2019 at 01:27):

How does SerAPI (https://github.com/ejgallego/coq-serapi) compare to Lean's foreign function interface (ffi)? Or are they totally different?

view this post on Zulip Reid Barton (Oct 24 2019 at 01:30):

It looks like something different

view this post on Zulip Bryan Gin-ge Chen (Oct 24 2019 at 01:31):

SerAPI looks more like Lean's server mode, at least from my quick skim of the Github readme. At least all of Lean's editor integration is done via lean --server.

view this post on Zulip Bryan Gin-ge Chen (Oct 24 2019 at 01:32):

Though as Mario said, the Lean server mode uses JSON and it looks like SerAPI is doing something much more sophisticated.

view this post on Zulip Reid Barton (Oct 24 2019 at 01:33):

The Lean FFI lets you call C functions from a compiled Lean program. For example handle.close is implemented by lean_io_prim_handle_close (okay, bad example!)

view this post on Zulip Reid Barton (Oct 24 2019 at 01:33):

or Int.add is implemented by lean_int_add

view this post on Zulip Mario Carneiro (Oct 24 2019 at 01:34):

can you pass or return objects?

view this post on Zulip Reid Barton (Oct 24 2019 at 01:35):

Like Ints? :slight_smile:

view this post on Zulip Reid Barton (Oct 24 2019 at 01:35):

though probably Int is itself some kind of magic when compiled

view this post on Zulip Mario Carneiro (Oct 24 2019 at 01:35):

As long as there is no message passing, I guess there is no reason for much performance overhead with the FFI; total runtime should be dominated by the C function itself

view this post on Zulip Mario Carneiro (Oct 24 2019 at 01:36):

but if you have some huge array you have to pass in, that could hurt if the FFI layer isn't done properly

view this post on Zulip Reid Barton (Oct 24 2019 at 01:40):

Based on the other performance engineering that has already gone into Lean 4, I'm confident that it will be at least possible to do efficient FFI

view this post on Zulip Reid Barton (Oct 24 2019 at 01:40):

The @& in the type of Int.add means that the argument is borrowed, I think

view this post on Zulip Reid Barton (Oct 24 2019 at 01:42):

https://github.com/leanprover/lean4/blob/master/library/Init/Data/Array/Basic.lean#L75 makes me think you can do zero-copy FFI with Array (assuming the C side is well-behaved of course)

view this post on Zulip Reid Barton (Oct 24 2019 at 01:43):

It would be cool if there was an FFI to Rust that could cooperate with the Rust types

view this post on Zulip Reid Barton (Oct 24 2019 at 01:43):

although I have no idea whether that is even possible

view this post on Zulip Brando Miranda (Oct 24 2019 at 01:44):

Is this High Performance conversation with the foreign function interface (FFI) the reason Coq projects (CoqGym, gamepad, etc) do not do Reinforcement Learning (RL)? Anyone know?

view this post on Zulip Reid Barton (Oct 24 2019 at 01:51):

I don't know the answer to that, but this FFI business is relevant because it means it is actually viable to build your ML system in Lean, which gives you direct access to the Lean tactic state and so on as well. For other theorem provers, you'd want to use a different programming language and then you have the problem of importing/exporting data like the tactic state. (Although I'm not sure why you couldn't just use OCaml as the host language for the theorem provers written in it.)

view this post on Zulip Brando Miranda (Oct 24 2019 at 02:28):

Well, most serious ML researchers use python, so thats why, I believe most of us don't know OCaml (I'm learning it myself now thought cuz I predicted it might be useful as you have pointed out, but even if I write my ML in OCaml then it means little people can build on it if its all in OCaml)

view this post on Zulip Brando Miranda (Oct 24 2019 at 02:30):

but this FFI business is relevant because it means it is actually viable to build your ML system in Lean, which gives you direct access to the Lean tactic state and so on as well.

Oh interesting! But would that mean I can build an ML system inside of Lean or inside of Python?

view this post on Zulip Brando Miranda (Oct 24 2019 at 02:32):

For other theorem provers, you'd want to use a different programming language and then you have the problem of importing/exporting data like the tactic state.

Do you mind expanding what this means? In particular, why does one need an external programming language for most ITPs? (perhaps a few examples would be nice) Also, why isn't this a problem in Lean? Is it because Lean is a programming language itself or because of the foreign function interface (FFI)?

(Perhaps I will go and play with Lean's foreign function interface (FFI) so that its less wishy-washy in my head and get my hands dirty).

view this post on Zulip Jason Rute (Oct 24 2019 at 02:35):

As for rewriting HOL light in C++. I tried to look it up. It seems that in the first HOList paper and the website mention that their modified form of HOL Light is called DeepHOL. I think looking at the code they seemed to have just rewritten the kernel in C++, but I am not certain. The whole thing is usable as a docker container where one can treat it as a black box theorem prover interface. (See the website for how to use it.)

view this post on Zulip Reid Barton (Oct 24 2019 at 02:39):

Other theorem provers (including Lean 3, really) either aren't programming languages at all, or aren't adequate as programming languages for the task (though I must say I don't understand the situation with Coq, in particular)

view this post on Zulip Jason Rute (Oct 24 2019 at 02:43):

I think one would need the following in a project like this.

  • A reinforcement learning master algorithm which decides which problems to try to attempt, tells the agent to try to solve them, records the results, and uses these results to train the neural network. This would also probably be a heavily parallelized application.
  • A search algorithm which tries to solve a particular problem (repeatedly querying a neural network as an oracle).
  • The part which actually runs the neural network. (And for speed it probably needs to batch up calls to the neural network and send them together to make efficient use of the GPU.
  • The part which trains the neural network.

I think the last two need to be in Python or C++. The tree search agent could be in Lean with FFI to the part which calls the network, but I don't know how well a purely functional language does with (non-depth-first) tree search. The overall master agent could be written in lean, but I assume it would make more sense in something else.

view this post on Zulip Jason Rute (Oct 24 2019 at 02:45):

Also, I guess speed doesn't matter as much when you have massive parallelism. (At my job, when we need something done fast, we just reserve more AWS instances. [Well in theory. In practice there always seems to be a bottle neck or it is too expensive.])

view this post on Zulip Jason Rute (Oct 24 2019 at 12:24):

I thought a bit more about tree search algorithms in Lean. In general, reasonably efficient search algorithms can be implemented with maps (hash tables or other lookup data structures) and priority queues (heaps). I know Lean 3 doesn't have a priority queue/heap but I found a good one in the book Purely Functional Data Structures. A few weeks ago I tried implementing both their BinomialHeap and SplayHeap in Lean. The SplayHeap is about an order of magnitude faster than the Binomial Heap and can do heap sort in Lean as fast as Lean's merge sort. (I noticed Lean 4 implements BinomialHeaps in the base library, and I wonder if SplayHeaps would be better.) So maybe if Lean is decked out with the fastest purely functional fastest data structures (or uses FFI to call non-functional C++ data structures), then it wouldn't be a large bottleneck to have the search agent live in Lean. Then one would have a powerful fast search tactic in Lean (which could be guided by FFI calls to a pre-trained neural network).

view this post on Zulip Brando Miranda (Oct 24 2019 at 14:07):

Then one would have a powerful fast search tactic in Lean (which could be guided by FFI calls to a pre-trained neural network).

What worries me is the "pre-trained" neural net (NN) part. I think if Lean is going to be used besides just a theorem prover, it should allow for training of the NN. Also, another thing to consider is that its going to be hard for people to adopt the challenge or ITP Lean environment/dataset if it all lives in a new programming language that is not "standard" like python. I dont think its going to be able to kick off like the famous large-scale computer vision competition/dataset (ImageNet). It has to be taken into account that if every competitor is forced to learn a new programming paradigm (like functional programming (fp)), note this isn't only a new programming language, it might take some time to people really use it (or perhaps people won't). It takes some time to get good at a new programming language, specially if its a new paradigm. Having things in Python imho will make any ITP challenge more likely to thrive.

view this post on Zulip Jason Rute (Oct 25 2019 at 00:21):

@Brando Miranda I think in some sense one needs both (to (1) be able to run a search algorithm inside Lean with a NN and (2) guide Lean from the outside in something like Python). Setting aside the IMO challenge and just thinking about improving Lean, from the perspective of a Lean user, they want whatever system one is building to be usable in Lean. And this is really a problem with the other tools out there. I don't know if any of the AI/ITP systems currently out there are usable and useful to the practitioners of that ITP system. However, if Lean had a HOList-like tactic called lean_ist (which admittedly would involve some additional setup), then they could write by lean_ist. The system would behave similar to the library_search tactic, outputting a proof which can be pasted into Lean.

view this post on Zulip Jason Rute (Oct 25 2019 at 00:21):

On the flip side, something that inhibits the growth of these AI for ITP systems is that it is a major engineering feat to link your AI library to your theorem prover (and just as large of a feat to learn the intricacies of the logic). One thing that has really increased neural network research progress has been environments which are easy to spin up and play with. These "gym" environments allow you apply the same training algorithm to many different problems which have the same interface. I think a very useful research project would be the following: Go through the other AI for theorem proving projects out there, and find a common gym-like interface for interactive theorem provers and related systems, including tableau calculus, constraint solvers, QBF solvers, SMT solvers with tactics, non-classical proof systems. (I can point one to over a dozen papers, each in a separate system.) As far as I see they have the following uniform framework:

  • A term and formula language
  • A local goal state: What one is trying to prove at the moment (as an ordered list of formulas in some formal language)
  • Premise List: all the possible previously proven theorem statements one could use (which needs to be significantly narrowed down in a process called “premise selection”)
  • Tactics or inference rules: a fixed finite set of rules one can apply to ones current goal state
  • Tactic parameters: the possible parameters that can be added to the above tactics. (This is by far the most inconsistent and flexible part of the framework. These parameters can include numbers, terms, and premises from the premise list.)
  • a training and test set: formulas to prove (and possibly proofs for supervised learning)
  • application: one needs to be able to quickly apply a rule/tactic
  • persistence/backtracking: So that any tree search algorithm can be applied, one needs a notion of backtracking. (Practically, this means that states need to be persistent. If one applies a tactic it creates a new state, without changing the old state. Think immutable data structures.)

view this post on Zulip Jason Rute (Oct 25 2019 at 00:21):

Now not all systems use all of the above. A simple logical system, e.g. QBFs, may not use premise selection or tactic parameters. A fully automatic system (e.g. an ATP like E-prover) may only have one tactic (solve) and the challenge is just premise selection. Some systems also don’t have the backtracking, but that severely limits how effectively one can search.

view this post on Zulip Jason Rute (Oct 25 2019 at 00:22):

Now HOList has basically created a system like this in DeepHOL (although not very well documented) which they make easier to use by burying it in a docker container, so it just becomes a black box (a gym). I also think they had to do a lot of work to make the persistence/backtracking thing work, but that is also not documented well. I think the CoqGym and GamePad systems also try to do something similar for Coq. In some systems like MetaMath, it shouldn’t be terribly hard to just rewrite the logic to make such an environment. The question is, can Lean be abstracted into a system like this which is fast (and more importantly, it satisfies the persistence/backtracking requirement).

view this post on Zulip Jason Rute (Oct 25 2019 at 00:22):

Getting back to the IMO challenge, I feel the goal is different than developing a training algorithm for an arbitrary neural theorem prover. However, I agree that it isn’t clear if this project will attract those outside of the established Lean community or Microsoft Research because of the barrier to entry. However, you are here @Brando, so maybe it is working. :)

view this post on Zulip Mario Carneiro (Oct 25 2019 at 00:27):

lean has always had a backtracking tactic state

view this post on Zulip Mario Carneiro (Oct 25 2019 at 00:27):

speed might be an issue, but possibly lean 4 has solved that problem

view this post on Zulip Jason Rute (Oct 25 2019 at 00:28):

But is it useable from the outside. Could it be wrapped in a black box with a set number of tactics? (So from the outside it is just like a chess board with a set number of possible moves?)

view this post on Zulip Mario Carneiro (Oct 25 2019 at 00:32):

I would really like a C++ API for doing this

view this post on Zulip Mario Carneiro (Oct 25 2019 at 00:33):

You can do it with lean --server but there is far too much overhead involved


Last updated: Aug 05 2021 at 04:14 UTC