Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: AI + metaprogramming


Daniel Windham (Oct 26 2023 at 20:11):

Let me know if there's already discussion on this, I'm new to the Lean community. (Hi!)

How do folks expect Lean's metaprogramming capabilities will be pros and cons to automation? For instance, I expect this extra power to help integrate automation capabilities into the heart of Lean's capabilities, but to hinder the ability for LLMs to reason about Lean code that involves custom tactics/syntax/etc. What are some concrete pros and cons? And are there changes to Lean's metaprogramming capabilities that could make Lean more accessible to automation?

FYI @Nada Amin, whose question I'm paraphrasing here.

Adomas Baliuka (Oct 26 2023 at 22:07):

When using reinforcement learning or similar techniques, I expect "cheat detection" to play a big role. D. J. Bernstein's recent paper, pdf also mentions this (chapter 4).

I write "cheat detection" instead of "soundness" because it's not only about the underlying type theory or the kernel but about "actually not being able to prove False against the checker that gives feedback to the language model".

Assuming that progress in machine learning will continue and absent massive capital that cound influence the direction there, I expect to see significant progress in the near future by using and building upon existing models and making sure the checker faced by any training routine isn't cheated.

Metaprogramming certainly should make this easier and potentially increase both the ability of humans to help models be effective by manually "providing tools that directly address systems' weaknesses" as well as perhaps allowing models to bootstrap themselves. It also of course enables effective use of these tools by humans.

I don't expect metaprogramming or the use of tactics to significantly hinder progress or be hard to deal with. It reduces the necessary context length and actually makes proofs easier to understand "heuristically". I expect this to be true both for humans and for language models. The ability to use tactics makes proving things easier, which is also true for both humans and language models.

Jason Rute (Oct 27 2023 at 00:35):

It is an interesting question. I think in many ways I hold out more hope than I used to. With LLMs and other foundation models, they are really flexible and have a better chance of learning from new code than more rigid methods. Maybe in the near future LLMs (or other models) will be able to, say, read the code of a new tactic. But if not, hopefully, they will (through long contexts, retrieval, and other methods) be able to see how new definitions, new tactics, new code, etc are used and documented. This will let them adapt. Further, methods like reinforcement learning and continual fine-tuning are also options to keep a model up-to-date with new information.

Jason Rute (Oct 27 2023 at 00:35):

Also, even if a model can't use a new tactic, is it necessary for it to do so? A new tactic may only be a convenience?

Jason Rute (Oct 27 2023 at 00:35):

Also, speaking of new tactics, what if a model could write its own tactics. Say it finds a pattern in proofs, and then uses Lean's metaprogramming to write a custom tactic. This would really be the holy grail of machine learning, i.e. machine learning methods that learn fast, simple, and explainable algorithms. We are nowhere near that, as far as I can see, but it is still a goal to keep in mind.

Jason Rute (Oct 27 2023 at 00:35):

I should also caution that AI ≠ LLMs. I would even say AI ≠ neural networks. LLMs are great. They are powerful, and most importantly, they have hit this point where there is a critical mass of tools and practitioners to make them easy to use. However, it is not at all clear that LLMs are state-of-the-art in ITP-based theorem proving, including Lean, Coq, Isabelle, and HOL Light. The problem is that non-LLM approaches and LLM approaches have not been properly compared against each other. Case in point, the main non-LLM AI tools are in HOL Light and Coq (including the HOList framework, ProverBot9001, and the various CoqGym solvers) and the main LLM tools are in Isabelle and Lean. This makes it really difficult to compare. Also, LLM tools require a lot more resources. The LLM results in papers wouldn't be nearly as good on a laptop, and if run on a server they would be prohibitively expensive when doing full-scale proof search. While the non-LLM solvers often are benchmarked on a CPU.

Jason Rute (Oct 27 2023 at 00:35):

Finally, non-LLM tools (or even LLM tools with custom architectures) have an ability that off-the-shelf LLM tools don't have, which is to interact with different forms of data, like the underlying tree structure of the prof terms or all the theorems in the environment. This is where Lean's metaprogramming will shine, in that tools (both LLMs and non-LLMs) can be hooked up not only to the user-facing Lean code but also to the full internals of Lean. The sky is the limit to what creative people can do with that. So I think in the end metaprogramming is a win since it lets us quickly build tools to interact with AI agents.

Jason Rute (Oct 27 2023 at 00:43):

As for what is needed, I'm at this stage now where I don't think it needs to be determined in advance. As people build things they will get to a block in the road. When working on PACT one block was that we didn't have a nice way inside the tactic monad to run tactic commands provided as text, like "simp [nat.foo_bar]". Ed Ayers stepped in and modified Lean 3 (community edition) at the time. Now with Lean 4, I keep hearing stories of the developers rapidly improving Lean for power users with special needs, and it is clear the developers support AI. If you (try to) build it, they will come (help you implement the tools you need).

Siddhartha Gadgil (Oct 27 2023 at 02:12):

Clear gains from MetaProgramming is the ease of integration and of data extraction. For example it is efficient to filter outputs to only ones that elaborate (i.e., compile). Also it is easy to extract various kinds of data. For instance, by default Lean suppresses the types of many variables, but we can write custom code to output these.

Jason Rute (Oct 28 2023 at 20:39):

@Daniel Windham On later thought I do think my first response might have been a bit flippant. As I think about it more having a more expressive metaprogramming language can present challenges, and it is possible we are already seeing those challenges in Lean 4. A large many approaches so far in this space have been along the lines of tactic predictor plus tree search. This works ok, but only if the tactics are relatively atomic as they often are in Lean 3. But already in Lean 4, we see tactics like the new induction ... with tactic which can turn a long and complicated induction proof into a one-tactic proof where that one tactic is large and composed of multiple tactics inside. For example, a theorem in Lean core:

theorem div_le_self (n k : Nat) : n / k  n := by
  induction n using Nat.strongInductionOn with
  | ind n ih =>
    rw [div_eq]
    -- Note: manual split to avoid Classical.em which is not yet defined
    cases (inferInstance : Decidable (0 < k  k  n)) with
    | isFalse h => simp [h]
    | isTrue h =>
      suffices (n - k) / k + 1  n by simp [h, this]
      have hK, hKN := h
      have hSub : n - k < n := sub_lt (Nat.lt_of_lt_of_le hK hKN) hK
      have : (n - k) / k  n - k := ih (n - k) hSub
      exact succ_le_of_lt (Nat.lt_of_le_of_lt this hSub)

Now, @Daniel Selsam once pointed out that this can be decomposed into separate tactics using something like induction n using Nat.strongInductionOn with | ind n ih => ?_ but of course that would lead to messier final output. I don't know how @Sean Welleck or @Kaiyu Yang have handled this with Lean 4 so far. It might so far be less of an issue since Mathlib is largely a port of Lean 3 code.

Jason Rute (Oct 28 2023 at 20:39):

But even in Lean 3 it wasn't uncommon to see long calc tactics, tactic combinators like try and repeat and ; and term proofs inside tactic proofs inside term proofs. Proofs are much more messy than the simple model of a linear proof that these tree search methods often suppose. And even if you can always fall back to thinking of a complicated nested tactic as one tactic, it doesn't give the model much opportunity to backtrack. Of course, this is actually the issue with code generation as well which doesn't have this convenient notion of a tactic or a proof state. So I think we will see more mergers of ideas in code generation and proof generation. I mean, only half of what Lean users do is proof writing anyway. A lot of it is structuring the overall code including definitions, lemmas, and writing/using APIs.

Jason Rute (Oct 28 2023 at 20:39):

As for what is needed in metaprogramming I think it is two things. I don't know if they are already in Lean. One, we need the ability to store additional meta information alongside declarations in the environment. For example, lots of machine learning algorithms assign vector embeddings to theorems for the purpose of premise and argument selection. To make this practical, one needs to be able to calculate these embeddings on the fly and store them. It isn't obvious even where or how they should be stored, but it might make sense to put them in the Lean environment. And even if not, if there was a standard mechanism, it would help make this sort of research (which I think is largely academic) practical.

Jason Rute (Oct 28 2023 at 20:39):

Second, I think we need ways to simultaneously interact with the code of a file as well as the Lean internals (like the proof state, but also other things in the environment that the user doesn't typically see). I think this is already largely possible through the new language server, but it would be good to have more examples for researchers.


Last updated: Dec 20 2023 at 11:08 UTC