Zulip Chat Archive

Stream: new members

Topic: Introduction


Cody Johnson (Oct 25 2021 at 15:37):

Hi Lean community, I just joined the Zulip and I'm excited to get to know you all. I've been working on automated theorem proving for about 1.5 years, and I'm mainly interested in the goal of translating natural language into Lean. Nice to meet you all!

Kevin Buzzard (Oct 25 2021 at 16:19):

Does anyone know what happened to Hales' work on a controlled natural language for Lean?

Kevin Buzzard (Oct 25 2021 at 16:20):

I know that Barnet-Lamb has also been thinking about this kind of thing, relating the Ganelasingham work to Lean.

Jason Rute (Oct 25 2021 at 23:19):

Hi @Cody Johnson! What does "translating natural language into Lean" mean to you? As for what I mean, here are some possible interpretations:

  1. Automatically translate natural language math (either statements, definitions, and/or proofs) into Lean code (auto-formalization as it is sometimes called).
  2. Manually convert math (usually definitions and theorem statements) into something resembling natural language (controlled natural language), but which is also machine readable and convertible automatically to Lean code.
  3. Take a math theorem and it's proof and formalize it in Lean or another formal proof language.

Cody Johnson (Oct 25 2021 at 23:55):

Hi @Jason Rute , I'd say 1. My (admittedly over-ambitious) goal is to be able to take raw English/LaTeX and automatically output the Lean code it represents. In order to do this, I've been analyzing commonly used words and phrases in math proofs and using it to specify a PEG grammar for English/LaTeX.

Yury G. Kudryashov (Oct 25 2021 at 23:59):

I bet it's impossible to write a nice grammar that is actually useful to convert real-world papers from natural language to Lean.

Cody Johnson (Oct 26 2021 at 00:02):

Yeah, it's sure challenging. I'm currently trying to translate some existing papers just as an exercise to see firsthand what makes it so hard.

Jason Rute (Oct 26 2021 at 00:04):

I tend to agree with Yury. I do think modern neural language modeling is nearly up for the task however. The big challenges are (1) lack of good data (in particular low amounts of Lean data), and (2) for any statement about a new or uncommon mathematical idea, you have to formalize the definition as well so it isn't really like natural language translation.

Yury G. Kudryashov (Oct 26 2021 at 00:04):

I guess, it might be possible to use machine learning to extract some statement (and AFAIR someone was working on this) but I have no experience with machine learning.

Yury G. Kudryashov (Oct 26 2021 at 00:06):

If we also have a cucumber-style way to state Lean theorems (should be possible with new Lean 4 macro features), a mathematician without Lean background should be able to proofread the output.

Cody Johnson (Oct 26 2021 at 00:11):

One problem with neural language modeling is they aren't able to get 100% accuracy which this problem really needs. At very least you'd need your neural nets to be able to identify when they're outputting junk for people to be able to trust them for peer review (false positives). I think grammar based approaches are able to achieve this accuracy, plus with PCFG methods you're able to interact with the user and say "I tried to translate this and got it wrong. Did you mean ____?" Plus grammars are pretty mature, for instance the Penn Treebank system is almost 30 years old.

Jason Rute (Oct 26 2021 at 00:13):

Szegedy's N2Formal team at Google has been interested in this for a while. Their manifesto is A Promising Path Towards Autoformalization and General Artificial Intelligence (It used to be freely available as a preprint, but I guess not anymore.) Also, they have a talk about some partial progress (where they ran into a lot of roadblocks). It is the last talk on this group of AITP 2020 talks.

Further, Josef Urban, Qingxiang Wang, et al have been playing around with this for a while. I think they have some impressive smaller scale results, for example First Experiments with Neural Translation of Informal to Formal Mathematics and other results by Qingxiang Wang.

Jason Rute (Oct 26 2021 at 00:16):

Sure accuracy is an issue (and sometime even an issue in human interpretation of mathematics because of subtleties), but I personally think of grammars and the like as only ever being able to reach a small local maximum. We need more flexible tools to handle the messiness of natural language, and then we need to improve those tools to give better results.

Jason Rute (Oct 26 2021 at 00:17):

But I also don't mean to get into a flame war here. (I did that once and it wasn't fun.) I'm happy to disagree and be proved wrong later.

Yury G. Kudryashov (Oct 26 2021 at 00:18):

An important problem with reading (and formalizing) natural language proof is that something like this is acceptable in a paper: “The following lemma was proved in [5] for $C^2$ smooth functions. The proof works with minor modifications for $C^{2+\alpha}$ smooth functions and gives the following estimates.”, where [5] is a 50-pages long paper and there is neither theorem nor page number in the reference.

Jason Rute (Oct 26 2021 at 00:20):

And "minor modifications" is often an understatement.

Cody Johnson (Oct 26 2021 at 14:10):

This is the exact reason why automated peer review is so important, for catching things like these.

Scott Morrison (Oct 27 2021 at 00:33):

I mean, I don't think "catching" is the right word here. The reviewers read that sentence, and probably knew perfectly well that [5] is 50-pages long and required modifications. A good reviewer would have even spent some time thinking about whether they believed the changes would be "easy". Everyone involved, author/reviewer/editor/reader would be unhappy if they were told that one wasn't allowed to do this.


Last updated: Dec 20 2023 at 11:08 UTC