Zulip Chat Archive

Stream: general

Topic: AI for Math fund

Kim Morrison (Dec 05 2024 at 22:50):

This thread is for discussion of the #announce > AI for math fund announcement.

Kim Morrison (Dec 05 2024 at 23:12):

I really hope that both applications and funding decisions will focus on building practical tools, and will try to ensure that the funded projects produce widely used tools. (i.e. not just research notes about tools that aren't usable in practice...)

Kim Morrison (Dec 05 2024 at 23:13):

There is so much basic infrastructure that we are missing here. (I know that people are working on some of these already.)

Kim Morrison (Dec 05 2024 at 23:13):

A generic Lean framework for talking to LLMs. (e.g. with a pluggable model backend, and a uniform frontend for running chat sessions (in a monad!) and handling the task of extracting code blocks, running them in Lean, and extracting the messages/goal states ready for the agent to hand back to the model).
A #formalize command built on top of that.
A general purpose library for Monte Carlo tree search for proofs.
A replacement for the past-its-use-by-date Lean REPL.
Python bindings for a REPL.
Data extraction tools. We have lean-training-data, LeanDojo, and Pantograph, but we could do better!
A tool for "indexing sorries", e.g. that can go through branches of github repositories, and record the locations (along with git SHAs and dependency/toolchain information) of sorrys, in a way that can be reproducibly "replayed" (preferably with direct integration with a good REPL, so you can just "load the state of a sorry" with a single command).
General purpose premise selection tools
Hammers
Improving Plausible (Lean's counterexample generating library) from a proof of concept to a real tool.
Designing an open, extensible, automatically generated "library". (Containing statements both generated by autoformalization and "fuzzing" of existing open statements, and containing proofs and disproofs of statements, along with provenance information for statements and proofs.)
Good data sets for training/testing/benchmarking specialized models that try to answer "how likely is this statement to be true" and "how long is the proof of this statement likely to be".

Kim Morrison (Dec 05 2024 at 23:13):

If I had infinite time this is what I would work on, and if I had infinite money this is what I would hire for. :-)

Jason Rute (Dec 06 2024 at 00:36):

I second usable tools. Ideally, such projects, if for Lean, should have a good mix of folks with skills across AI, Lean metaprogramming, and Lean practitioners with usable tools in mind. Take premise selection for example from Kim's list above. Let's say you want to make a useful premise selection tool for Lean users. On the AI side, there are questions about how to design a good premise selection tool that is fast enough, doesn't break the bank, and is robust to new definitions and lemmas being added or changed. On the metaprogramming side, there are different questions: If you are doing a key-query system, where are the keys stored? What do you do if the premises change? Can your system work across recent versions? Is there a mechanism for loading previously computed keys? How about computing keys for new definitions? How about computing keys for definitions in the current file? How about computing the keys during CI? (This is all starting to look like .olean files, by the way. Can one just piggyback on that system?) If instead, it is more practical for the premise selector to live on a server (say because they are too big or because it can be integrated with LeanSearch), how does that work? Is there a LeanSearch-like API, but one that can be prompted with the current theorem? Can this be integrated with the tactic framework or with the vscode plugin or language server? Can the Lean/Mathlib version be communicated automatically? On the user experience side, there are questions about what premise selection is needed for, if it needs to be a tactic or plugin, if it would interfere with privacy to have it hosted on a server, and if one needs to only pick premises in the current environment or also outside the environment.

Jason Rute (Dec 06 2024 at 00:44):

I worry a lot about the current tools being built (including by me). There is so little taken into account about user experience or practical ITP integration.

GasStationManager (Dec 06 2024 at 03:17):

My interest is mainly in AI for code generation, but do have a shared interest in building AI-friendly infrastructure & tools for Lean.
Right now I'm just writing some tools as I need them. E.g. a plugin to allow LLMs to invoke lean. I try to make them generally useful, but they are still designed with my use cases in mind. The hope is that by open sourcing them, folks can build on these initial ideas, perhaps add features to suit their needs.

Shreyas Srinivas (Dec 06 2024 at 11:01):

I think the reason usable tools don't get attention is because these types of contributions are not valued in academic hiring. Until academia changes the way it evaluates research contributions (especially in theoretical areas), I don't see this changing too much.

Shreyas Srinivas (Dec 06 2024 at 11:05):

Even if a non-academic person gets the grant, who is going to maintain these tools and how will they get paid, once the grant expires?

Shreyas Srinivas (Dec 06 2024 at 11:18):

(deleted)

Steven Clontz (Dec 06 2024 at 12:28):

Different institutions count beans differently, but winning a grant like this would count as professional development at some places. Of course, that doesn't scale. If there was an "overlay" journal that published (short) papers representing authors' work on math infrastructure, it would help a college T&P committee recognize this labor as scholarship.

Shreyas Srinivas (Dec 06 2024 at 13:24):

On the one hand, yes it could help. On the other hand, the term "scholarship" needs to be defined more broadly

Steven Clontz (Dec 06 2024 at 20:32):

Yeah, the journal solution is a polyfill for legacy metrics to evaluate faculty; helping institutions recognize modern academic labor more broadly is an important (and harder) problem as well.

Leni Aniva (Dec 06 2024 at 23:56):

Pantograph can be a successor of Lean REPL. (Disclaimer: I'm the main developer of Pantograph)

Its Python bindings support MCTS (thanks to the contribution of one researcher from Cambridge)
Its data extraction unit is very extensible, and now I'm just adding new extraction features based on user requests
It has a uniform backend for running tactics and data extractions
It can "load sorries" and that was a main contribution of our paper

Pantograph is a vital component of my main research project and I don't see its development and maintenance could end as long as that project is active. I could also hand off maintenance of the project to other researchers in the Stanford Centaur Lab.

Leni Aniva (Dec 07 2024 at 00:05):

Jason Rute said:

I second usable tools. Ideally, such projects, if for Lean, should have a good mix of folks with skills across AI, Lean metaprogramming, and Lean practitioners with usable tools in mind. Take premise selection for example from Kim's list above. Let's say you want to make a useful premise selection tool for Lean users. On the AI side, there are questions about how to design a good premise selection tool that is fast enough, doesn't break the bank, and is robust to new definitions and lemmas being added or changed. On the metaprogramming side, there are different questions: If you are doing a key-query system, where are the keys stored? What do you do if the premises change? Can your system work across recent versions? Is there a mechanism for loading previously computed keys? How about computing keys for new definitions? How about computing keys for definitions in the current file? How about computing the keys during CI? (This is all starting to look like .olean files, by the way. Can one just piggyback on that system?) If instead, it is more practical for the premise selector to live on a server (say because they are too big or because it can be integrated with LeanSearch), how does that work? Is there a LeanSearch-like API, but one that can be prompted with the current theorem? Can this be integrated with the tactic framework or with the vscode plugin or language server? Can the Lean/Mathlib version be communicated automatically? On the user experience side, there are questions about what premise selection is needed for, if it needs to be a tactic or plugin, if it would interfere with privacy to have it hosted on a server, and if one needs to only pick premises in the current environment or also outside the environment.

I think the development of Lean tooling will be distributed. Maybe one tool handles data extraction, one handles proof search and execution, and one handles interactions in IDEs.

Jason Rute (Dec 07 2024 at 00:40):

Distributed is fine and possibly good, but we also need good interoperability. So far that has been a huge strength of Lean and Mathlib. If you have lots of haphazard libraries that don't play well together they are going to be a pain to work with.

GasStationManager (Dec 07 2024 at 20:33):

I think the people with the most incentive to build tools are those building towards some use case beyond the tools, and need to be able to use the tools. Some of them (like me) have background in a different field (AI, or math) but are not necessarily Lean experts. And not necessarily aware of the other use cases. So an interesting question may be whether they can be paired up with the experts in a productive way

Leni Aniva (Dec 07 2024 at 23:40):

GasStationManager said:

I think the people with the most incentive to build tools are those building towards some use case beyond the tools, and need to be able to use the tools. Some of them (like me) have background in a different field (AI, or math) but are not necessarily Lean experts. And not necessarily aware of the other use cases. So an interesting question may be whether they can be paired up with the experts in a productive way

Rather than pairing people up why not have the users of automation tools file feature requests in git?

GasStationManager (Dec 08 2024 at 02:37):

What I had in mind was scenarios where the tool in question might not exist; the would-be user of the tool may be happy to try implement something, but would also benefit from help from more experienced people. But yeah likely a good starting point would be a feature request somewhere.

Jason Rute (Jan 09 2025 at 18:04):

I have no connections to this, but a reminder that this is due tomorrow.

Last updated: May 02 2025 at 03:31 UTC