Zulip Chat Archive

Hi Lean community, I just joined the Zulip and I'm excited to get to know you all. I've been working on automated theorem proving for about 1.5 years, and I'm mainly interested in the goal of translating natural language into Lean. Nice to meet you all!

Kevin Buzzard (Oct 25 2021 at 16:19):

Does anyone know what happened to Hales' work on a controlled natural language for Lean?

Kevin Buzzard (Oct 25 2021 at 16:20):

I know that Barnet-Lamb has also been thinking about this kind of thing, relating the Ganelasingham work to Lean.

Jason Rute (Oct 25 2021 at 23:19):

Hi @Cody Johnson! What does "translating natural language into Lean" mean to you? As for what I mean, here are some possible interpretations:

Automatically translate natural language math (either statements, definitions, and/or proofs) into Lean code (auto-formalization as it is sometimes called).
Manually convert math (usually definitions and theorem statements) into something resembling natural language (controlled natural language), but which is also machine readable and convertible automatically to Lean code.
Take a math theorem and it's proof and formalize it in Lean or another formal proof language.

Cody Johnson (Oct 25 2021 at 23:55):

Hi @Jason Rute , I'd say 1. My (admittedly over-ambitious) goal is to be able to take raw English/LaTeX and automatically output the Lean code it represents. In order to do this, I've been analyzing commonly used words and phrases in math proofs and using it to specify a PEG grammar for English/LaTeX.

Yury G. Kudryashov (Oct 25 2021 at 23:59):

I bet it's impossible to write a nice grammar that is actually useful to convert real-world papers from natural language to Lean.

Cody Johnson (Oct 26 2021 at 00:02):

Yeah, it's sure challenging. I'm currently trying to translate some existing papers just as an exercise to see firsthand what makes it so hard.

Jason Rute (Oct 26 2021 at 00:04):

I tend to agree with Yury. I do think modern neural language modeling is nearly up for the task however. The big challenges are (1) lack of good data (in particular low amounts of Lean data), and (2) for any statement about a new or uncommon mathematical idea, you have to formalize the definition as well so it isn't really like natural language translation.

Yury G. Kudryashov (Oct 26 2021 at 00:04):

I guess, it might be possible to use machine learning to extract some statement (and AFAIR someone was working on this) but I have no experience with machine learning.

Yury G. Kudryashov (Oct 26 2021 at 00:06):

If we also have a cucumber-style way to state Lean theorems (should be possible with new Lean 4 macro features), a mathematician without Lean background should be able to proofread the output.

Cody Johnson (Oct 26 2021 at 00:11):

One problem with neural language modeling is they aren't able to get 100% accuracy which this problem really needs. At very least you'd need your neural nets to be able to identify when they're outputting junk for people to be able to trust them for peer review (false positives). I think grammar based approaches are able to achieve this accuracy, plus with PCFG methods you're able to interact with the user and say "I tried to translate this and got it wrong. Did you mean ____?" Plus grammars are pretty mature, for instance the Penn Treebank system is almost 30 years old.

Jason Rute (Oct 26 2021 at 00:13):

Szegedy's N2Formal team at Google has been interested in this for a while. Their manifesto is A Promising Path Towards Autoformalization and General Artificial Intelligence (It used to be freely available as a preprint, but I guess not anymore.) Also, they have a talk about some partial progress (where they ran into a lot of roadblocks). It is the last talk on this group of AITP 2020 talks.

Further, Josef Urban, Qingxiang Wang, et al have been playing around with this for a while. I think they have some impressive smaller scale results, for example First Experiments with Neural Translation of Informal to Formal Mathematics and other results by Qingxiang Wang.

Jason Rute (Oct 26 2021 at 00:16):

Sure accuracy is an issue (and sometime even an issue in human interpretation of mathematics because of subtleties), but I personally think of grammars and the like as only ever being able to reach a small local maximum. We need more flexible tools to handle the messiness of natural language, and then we need to improve those tools to give better results.

Jason Rute (Oct 26 2021 at 00:17):

But I also don't mean to get into a flame war here. (I did that once and it wasn't fun.) I'm happy to disagree and be proved wrong later.

Yury G. Kudryashov (Oct 26 2021 at 00:18):

An important problem with reading (and formalizing) natural language proof is that something like this is acceptable in a paper: “The following lemma was proved in [5] for $C^2$ smooth functions. The proof works with minor modifications for $C^{2+\alpha}$ smooth functions and gives the following estimates.”, where [5] is a 50-pages long paper and there is neither theorem nor page number in the reference.

Jason Rute (Oct 26 2021 at 00:20):

And "minor modifications" is often an understatement.

Cody Johnson (Oct 26 2021 at 14:10):

This is the exact reason why automated peer review is so important, for catching things like these.

Scott Morrison (Oct 27 2021 at 00:33):

I mean, I don't think "catching" is the right word here. The reviewers read that sentence, and probably knew perfectly well that [5] is 50-pages long and required modifications. A good reviewer would have even spent some time thinking about whether they believed the changes would be "easy". Everyone involved, author/reviewer/editor/reader would be unhappy if they were told that one wasn't allowed to do this.

Richard Kiss (Mar 12 2024 at 20:24):

My name is Richard Kiss. I have a BSc in math from SFU (Burnaby, B.C.) and an MA in math from UCLA (I dropped out of a PhD program from there in 1994). I have been a programmer both as a kid and professionally for many years now.

I find lean fascinating. My goal is to use it to prove soundness of smart contracts (and what exactly that means remains to be seen) for Chia Network's cryptocurrency. I helped develop the smart contracting language, which is a dialect of lisp called ChiaLisp that compiles to a VM language called clvm. I like to say "CLVM is what you'd get if lisp and assembly language had a baby".

I don't remember much math beyond undergraduate so a lot of Mathlib is over my head, but also not relevant to my goals (except maybe some elliptic curve stuff use for bls12_381 crypto). Much more of work has to do with computability and proving theorems about programs, both in clvm and in the clvm interpreter, which I've reimplemented in Lean 4.

I'm currently trying to produce a useful set of base lemmas and theorems about clvm that will be a useful basis for proving things about programs written in the language. It's been challenging and I get stuck a lot, mostly due to inexperience and lack of knowledge.

Patrick Massot (Mar 12 2024 at 20:44):

@Richard Kiss Welcome! Do you know about https://arxiv.org/abs/2109.14534 and related work?

Richard Kiss (Mar 12 2024 at 20:48):

I've heard of STARKNET but I wasn't aware they'd done anything like this, very interesting!

Patrick Massot (Mar 12 2024 at 21:35):

Jeremy Avigad can be found here but he is extremely busy this week.

Christian K (Mar 13 2024 at 16:47):

Hi, my Name is Christian Krause. I am a 16-year-old high school student in germany and I'm working together with another student to formalize the banach-tarski Paradox in Lean. I'm not that experienced with math, but I have some experience with "conventional" programming languages (python, C, etc.). The other student in our Team is more experienced with math, so we are currently working on the foundations for the Banach Tarski Paradox. Our Blueprint (and some other stuff) is at https://bergschaf.github.io/banach-tarski/. We are still at the beginning of the project, but we've made some progress, a first lemma is already in the mathlib.

Patrick Massot (Mar 14 2024 at 01:23):

I’m curious: did you create this blueprint using the new command line tool or by copy-pasting from another project?

Christian K (Mar 14 2024 at 09:41):

I created the blueprint before the command line Tool, so i took some "inspiration" from the pfr repo. But I also forked your blueprint repo because a recent commit caused some very weird colors, which made the Text in the bubbles unreadable.

Patrick Massot (Mar 14 2024 at 13:19):

You should open a GitHub issue if this color issue persists on master.

Christian K (Mar 14 2024 at 13:31):

Yeah, I'll try the new Version when I have time and check if the issue persists. And another thing, it dosen't look like the new Version has an option to specify a color for \mathlibok. Is there a reason for this or is this another thing that could be implemented?

Patrick Massot (Mar 14 2024 at 14:06):

It’s probably forgotten because people don’t use this command much. You can also open an issue about that.

Christian K (Mar 14 2024 at 14:08):

Ok, I think the command is quite useful, so I added a first PR to neutralize the command in print.tex so it dosen't create any errors (especially with CI)

Last updated: May 02 2025 at 03:31 UTC

leanprover-community / mathlib