Zulip Chat Archive

Stream: general

Topic: nlp for lean?


Daniel Selsam (Jun 25 2021 at 01:56):

Hi general. I have been speaking with NLP researchers who wonder whether tools from semantic parsing, information extraction, and (assisted, not automatic) machine translation may be able to accelerate formalization. I see two high-level possibilities:

  1. interactive translation (i.e. fancy somewhat-semantically-aware autocomplete, for both statements and proofs)
  2. hint extraction (i.e. extract useful snippets from informal proofs to feed to hint-friendly solver)

For (2), the info-extraction could be entirely untrusted so it need not be interactive. For example, a system could extract multiple best-effort guesses at which variables to induct on, intermediate steps, existential witnesses, lemmas referenced, etc. There are certainly some kinds of solvers that could be very effectively with even moderately-reliable hints of this form, though no such solver exists yet for Lean.

High-level ideas aside, I don't have a good enough sense of the actual formalization process in mathlib to have an informed opinion about what is likely to be useful in practice. As mathlib grows, are most formalizations increasingly isomorphic to existing informal texts? In other words, is "translation" even a reasonable analogy for most projects? Roughly what percentage of the difficulty is in formalizing statements (and other non-machine-checkable components) versus proofs? Any other thoughts/concerns/doubts/suggestions? Thanks in advance.

Patrick Massot (Jun 25 2021 at 06:28):

I find it really hard to tell whether such tools would be useful without any idea of how well they would work. If they are useful one every 100 proofs then we won't bother asking (this is my experience with gptf for instance, typing the tactic name and waiting for it to run is simply distracting since it works so rarely). The example "best-effort guesses at which variables to induct on" and "existential witnesses" look completely useless to me. If we already have a paper proof then fetching this information takes up a negligible amount of time. The "intermediate steps" idea could be useful if the system manages to correctly state those steps in Lean. And "lemmas referenced" would be useful if the system actually finds the lemmas in mathlib or better, manages to find very related lemmas even if they are not exactly the actual lemma we need.

Patrick Massot (Jun 25 2021 at 06:30):

As mathlib grows, are most formalizations increasingly isomorphic to existing informal texts?

I would say no. We're not yet at this stage. There is simply too much missing ground work. And every new theory starts with dozens of lemmas and intermediate constructions that simply have no analogue in the real world. That's why I'm so skeptical of all those auto-formalization dreams.

Patrick Massot (Jun 25 2021 at 06:31):

Roughly what percentage of the difficulty is in formalizing statements (and other non-machine-checkable components) versus proofs?

I think formalizing statements is still taking up most of the time, except when computations are involved in the proof. Proofs by computation are still extremely painful, for lack of automation.

Daniel Selsam (Jun 25 2021 at 14:28):

@Patrick Massot Thanks for the thoughtful response. I share your general skepticism of auto-formalization, especially for statements. Maybe the better question would be: NLP researchers are excited about Mathlib and want to help -- can they? I am only trying to be a bridge for them in this thread.

If they are useful one every 100 proofs then we won't bother asking

Let's say in the thought experiment for (1), it is instantaneous and doesn't require asking each time, like visual studio's intellicode or recent translation-assistance tools. (I was imagining (2) would be done in batch-offline mode.) For (1), let's also say that it is connected to a good info-retrieval system so it can find lemmas for you based on both the current context and vague allusions in the informal text.

I find it really hard to tell whether such tools would be useful without any idea of how well they would work.

Hard to know at this stage. Auto-translate for natural language is not reliable enough for high-stakes contexts, but I have been told that in most contexts, human translators generally prefer fixing-up/interacting-with auto-guesses than translating from scratch. But I also guess that a lot of NL translation is literally typing-limited, and even the translation-like parts of formalizing probably have >1 OOM more thought-per-keystroke in general.

As mathlib grows, are most formalizations increasingly isomorphic to existing informal texts?

I would say no. We're not yet at this stage. There is simply too much missing ground work. And every new theory starts with dozens of lemmas and intermediate constructions that simply have no analogue in the real world.

To what extent do you think this stage will come eventually? I remember that for Feit Thompson, 80% of the lines were preliminaries, but that there was eventually a "translation-like" phase converting 250 human pages into Coq with a 4:1 de Bruijn ratio.

I think formalizing statements is still taking up most of the time

Interesting.

Proofs by computation are still extremely painful, for lack of automation.

This sounds like an oxymoron on first read -- non-automated proofs by computation. Can you please point me to a Zulip thread explaining this issue? Thanks.

Patrick Stevens (Jun 25 2021 at 17:04):

A non-automated proof by computation is e.g. a proof that x < y by showing that x < 65/100 and 65/100 < y. See, for example, https://github.com/leanprover-community/mathlib/blob/6d2e589f73961949e6a430f926ce64f1575eae72/src/number_theory/bertrand.lean#L539 which is a rather awful proof I did recently that 2/3 ≤ log 2. Absolutely trivial by decimal expansion, but you have to show that your expansions are sufficiently precise. (I'm quite sure it's possible to do better than how we ended up doing it, but that's the class of "extremely painful" we're talking about, I think.)

Heather Macbeth (Jun 25 2021 at 17:21):

Here are a couple of examples (taken from those I either wrote myself or read very closely) of computational proofs which are rather painful:

Heather Macbeth (Jun 25 2021 at 17:30):

And, again taking a random example I wrote myself, I think #8012 is a good example of the opposite phenomenon described by Patrick, where formalizing the statements takes up most of the time. I think it would be quite difficult for an AI (or even, say, someone without enough "mathematical maturity") to come up with a Lean statement for the lemma I have docstring'd as,

/-- If a conjugation-invariant subalgebra of `C(X, ℂ)` separates points, then the real
subalgebra of its purely real-valued elements also separates points. -/

Patrick Massot (Jun 25 2021 at 18:04):

I meant what Heather wrote.

Patrick Massot (Jun 25 2021 at 18:07):

You can also have a look at the proof of this.

Bhavik Mehta (Jun 25 2021 at 21:39):

Patrick Stevens said:

A non-automated proof by computation is e.g. a proof that x < y by showing that x < 65/100 and 65/100 < y. See, for example, https://github.com/leanprover-community/mathlib/blob/6d2e589f73961949e6a430f926ce64f1575eae72/src/number_theory/bertrand.lean#L539 which is a rather awful proof I did recently that 2/3 ≤ log 2. Absolutely trivial by decimal expansion, but you have to show that your expansions are sufficiently precise. (I'm quite sure it's possible to do better than how we ended up doing it, but that's the class of "extremely painful" we're talking about, I think.)

For the record, you can prove this with

lemma two_div_three_lt_log_two : (2 : ) / 3  log 2 :=
le_trans (by norm_num) log_two_gt_d9.le

Mario Carneiro (Jun 26 2021 at 04:33):

I think that it would be a bad idea to focus on "computational proofs" when looking at areas to improve. These are already well covered, just not in lean. The techniques are well known and this is a relatively easy task. (I'm referring to things like 2/3 <= log 2 here, not Heather's examples, which are things that a mathematician would consider computation but involve more parts of the library and are not as well-defined algorithmic.)

Kunhao Zheng (Jun 29 2021 at 11:26):

Another similar example is one of the exercises that I came across:

import data.real.basic
import data.real.sqrt

theorem aopsbook_v2_c13_ex7
  (x y z : )
  (h₀ : 0 < x  0 < y  0 < z)
  (h₁ : x * y = 12)
  (h₂ : y * z = 18)
  (h₃ : z * x = 24) :
  (x, y, z) = (4, 3, 6) :=
begin
  repeat { refine congr (congr_arg prod.mk _) _ }; nlinarith,
end

Here nlinarith can close the goal soundly, but if I add a factor of real.sqrt 2, that will make nlinarith loss its way.

import data.real.basic
import data.real.sqrt

theorem aopsbook_v2_c13_ex7
  (x y z : )
  (h₀ : 0 < x  0 < y  0 < z)
  (h₁ : x * y = 12)
  (h₂ : y * z = 18  * real.sqrt 2)
  (h₃ : z * x = 24  * real.sqrt 2) :
  (x, y, z) = (4, 3, 6 * real.sqrt 2) :=
begin
  -- repeat { refine congr (congr_arg prod.mk _) _ }; nlinarith,
  sorry
end

I wonder if it's a good direction for GPT-f to target this kind of "not-fully automated" but somehow "mathematically trivial" proofs? If I make an analogy it will look like proving ( ; ; 1 2 5 gcd ; ; 2 0 0 ) = ; 2 5 in Metamath, for which we can generate many structured proofs for training and then the machine is super capable in it.

I would like to know if GPT-f can mitigate this and guide people through the computation playing a good role of assistant, is this a need of the community? Happy to hear what you think about it.

Kunhao Zheng (Jun 29 2021 at 11:37):

One thing that I worry about is that this sounds kind of like a temporally solution: since this "not-fully automated" may be well mitigated by improving these high level tactics to make them more powerful, someday in the future an update of mathlib may make this effort become useless.

Hunter Monroe (Jul 04 2021 at 04:56):

@Daniel Selsam automation to fill in proofs and subproofs seems like a promising target for machine learning since there is 100 percent ground truth if the resulting proof is valid (like DeepMind playing Breakout), although theory and experience provide some basis to wonder how easily this task can be automated. Human intervention will need to come from people who intimately understand the thing they are trying to at least partially automate, so your NLP friends may want to gain some experience with formalizing. Translation of formal proofs from other systems into Lean should be far easier than translating informal proofs into Lean, but experience also indicates that is hard. Mario's MM0 can translate Metamath proofs into Lean, which is impressive, but then you need to match up Metamath and Lean concepts (automation could surely help), and even then, a 500K line proof of Dirichlet's theorem cannot be added to mathlib (better tools/automation for refactoring again might help).


Last updated: Dec 20 2023 at 11:08 UTC