Zulip Chat Archive

Stream: general

Topic: English-Math to Lean4 Conversion?


Daniel Donnelly (Jun 14 2022 at 19:45):

So English-Math is what I call the language that modern textbooks or well-formed MSE posts are written in. When will we be able to talk about math and have it convert into formally-verified Lean4 code? Who's working on that? What are their results?

Bart Michels (Jun 14 2022 at 20:06):

The main difficulty here I think is that conventional mathematical texts contain gaps too big to be filled in by automatic theorem provers (currently).
For the "who's working on that" part: There is the Naproche project, which is able to parse mathematical texts written in a controlled natural language: http://www.naproche.net/. If you download Isabelle, it comes with Naproche and example documents in Naproche, so you can get a feeling of what it is able to do. This blog post also has some examples: https://jiggerwit.wordpress.com/2019/06/20/an-argument-for-controlled-natural-languages-in-mathematics/
Other people are certainly thinking about this; here is an article (from a series) that tries neural methods: https://arxiv.org/abs/1912.02636

Bart Michels (Jun 14 2022 at 20:09):

But note that the focus there is on individual sentences. I think a hard part of the problem is when texts don't name all variables explicitly and contain pronouns, or when sentences are more complicated than object-verb-subject or similar constructs.

Alistair Tucker (Jun 14 2022 at 21:09):

Someone wrote very recently they were attempting something along these lines

Alistair Tucker (Jun 14 2022 at 21:09):

Here https://leanprover.zulipchat.com/#narrow/stream/113489-new-members/topic/lean4.20induction.20with.20multiple.20base.20cases

Jason Rute (Jun 14 2022 at 21:43):

The word you are looking for is probably "auto-formalization". It is a quite challenging topic for the following reasons:

  • There isn't a lot of aligned informal/formal data. (And certainly not for Lean 4 which has almost no data).
  • It isn't the sort of thing you can just build a parser to do. (We are having enough trouble translating idomatically between Lean 3 and Lean 4.)
  • If you are talking about mathematical textbooks, one has to not only translate the theorem and proof, but also all the definitions (or figure out what definitions in Lean are equivalent). This makes it very different from natural language machine translation.

Jason Rute (Jun 14 2022 at 21:43):

You can find a lot written about this topic by searching for "autoformalization" and "auto-formalization" here on Zulip (as well as on Google), especially within the #Machine Learning for Theorem Proving stream. Qingxiang Wang did a lot of early work on this with Josef Urban, but I don't know what happened to them. Google's N2Formal (Natural to Formal) group has shown a lot of interest in this topic even if they don't have many published results yet. Of particular interest is the very recent article by them (https://arxiv.org/abs/2205.12615), discussed here: #Machine Learning for Theorem Proving > Paper: Autoformalization with Large Language Models.

Jason Rute (Jun 14 2022 at 21:58):

@Augustin d'Oultremont You might be interested in this discussion (and the links I just posted) given what you said in the other thread about your informal to formal thesis (and that you were mentioned above).

Jason Rute (Jun 14 2022 at 22:10):

I wonder if we should merge this thread with the previous time the OP asked this question: #general > Converting informal math language to Lean.

Jason Rute (Jun 16 2022 at 01:22):

This question was asked a day too early. :smile: See the very interesting looking project just announced: #Machine Learning for Theorem Proving > A chat interface for formalizing theorem statements by @Edward Ayers and @Zhangir Azerbayev


Last updated: Dec 20 2023 at 11:08 UTC