Zulip Chat Archive
Stream: new members
Topic: Introduction
Cody Johnson (Oct 25 2021 at 15:37):
Hi Lean community, I just joined the Zulip and I'm excited to get to know you all. I've been working on automated theorem proving for about 1.5 years, and I'm mainly interested in the goal of translating natural language into Lean. Nice to meet you all!
Kevin Buzzard (Oct 25 2021 at 16:19):
Does anyone know what happened to Hales' work on a controlled natural language for Lean?
Kevin Buzzard (Oct 25 2021 at 16:20):
I know that Barnet-Lamb has also been thinking about this kind of thing, relating the Ganelasingham work to Lean.
Jason Rute (Oct 25 2021 at 23:19):
Hi @Cody Johnson! What does "translating natural language into Lean" mean to you? As for what I mean, here are some possible interpretations:
- Automatically translate natural language math (either statements, definitions, and/or proofs) into Lean code (auto-formalization as it is sometimes called).
- Manually convert math (usually definitions and theorem statements) into something resembling natural language (controlled natural language), but which is also machine readable and convertible automatically to Lean code.
- Take a math theorem and it's proof and formalize it in Lean or another formal proof language.
Cody Johnson (Oct 25 2021 at 23:55):
Hi @Jason Rute , I'd say 1. My (admittedly over-ambitious) goal is to be able to take raw English/LaTeX and automatically output the Lean code it represents. In order to do this, I've been analyzing commonly used words and phrases in math proofs and using it to specify a PEG grammar for English/LaTeX.
Yury G. Kudryashov (Oct 25 2021 at 23:59):
I bet it's impossible to write a nice grammar that is actually useful to convert real-world papers from natural language to Lean.
Cody Johnson (Oct 26 2021 at 00:02):
Yeah, it's sure challenging. I'm currently trying to translate some existing papers just as an exercise to see firsthand what makes it so hard.
Jason Rute (Oct 26 2021 at 00:04):
I tend to agree with Yury. I do think modern neural language modeling is nearly up for the task however. The big challenges are (1) lack of good data (in particular low amounts of Lean data), and (2) for any statement about a new or uncommon mathematical idea, you have to formalize the definition as well so it isn't really like natural language translation.
Yury G. Kudryashov (Oct 26 2021 at 00:04):
I guess, it might be possible to use machine learning to extract some statement (and AFAIR someone was working on this) but I have no experience with machine learning.
Yury G. Kudryashov (Oct 26 2021 at 00:06):
If we also have a cucumber-style way to state Lean theorems (should be possible with new Lean 4 macro features), a mathematician without Lean background should be able to proofread the output.
Cody Johnson (Oct 26 2021 at 00:11):
One problem with neural language modeling is they aren't able to get 100% accuracy which this problem really needs. At very least you'd need your neural nets to be able to identify when they're outputting junk for people to be able to trust them for peer review (false positives). I think grammar based approaches are able to achieve this accuracy, plus with PCFG methods you're able to interact with the user and say "I tried to translate this and got it wrong. Did you mean ____?" Plus grammars are pretty mature, for instance the Penn Treebank system is almost 30 years old.
Jason Rute (Oct 26 2021 at 00:13):
Szegedy's N2Formal team at Google has been interested in this for a while. Their manifesto is A Promising Path Towards Autoformalization and General Artificial Intelligence (It used to be freely available as a preprint, but I guess not anymore.) Also, they have a talk about some partial progress (where they ran into a lot of roadblocks). It is the last talk on this group of AITP 2020 talks.
Further, Josef Urban, Qingxiang Wang, et al have been playing around with this for a while. I think they have some impressive smaller scale results, for example First Experiments with Neural Translation of Informal to Formal Mathematics and other results by Qingxiang Wang.
Jason Rute (Oct 26 2021 at 00:16):
Sure accuracy is an issue (and sometime even an issue in human interpretation of mathematics because of subtleties), but I personally think of grammars and the like as only ever being able to reach a small local maximum. We need more flexible tools to handle the messiness of natural language, and then we need to improve those tools to give better results.
Jason Rute (Oct 26 2021 at 00:17):
But I also don't mean to get into a flame war here. (I did that once and it wasn't fun.) I'm happy to disagree and be proved wrong later.
Yury G. Kudryashov (Oct 26 2021 at 00:18):
An important problem with reading (and formalizing) natural language proof is that something like this is acceptable in a paper: “The following lemma was proved in [5] for $C^2$ smooth functions. The proof works with minor modifications for $C^{2+\alpha}$ smooth functions and gives the following estimates.”, where [5] is a 50-pages long paper and there is neither theorem nor page number in the reference.
Jason Rute (Oct 26 2021 at 00:20):
And "minor modifications" is often an understatement.
Cody Johnson (Oct 26 2021 at 14:10):
This is the exact reason why automated peer review is so important, for catching things like these.
Scott Morrison (Oct 27 2021 at 00:33):
I mean, I don't think "catching" is the right word here. The reviewers read that sentence, and probably knew perfectly well that [5] is 50-pages long and required modifications. A good reviewer would have even spent some time thinking about whether they believed the changes would be "easy". Everyone involved, author/reviewer/editor/reader would be unhappy if they were told that one wasn't allowed to do this.
Richard Kiss (Mar 12 2024 at 20:24):
My name is Richard Kiss. I have a BSc in math from SFU (Burnaby, B.C.) and an MA in math from UCLA (I dropped out of a PhD program from there in 1994). I have been a programmer both as a kid and professionally for many years now.
I find lean fascinating. My goal is to use it to prove soundness of smart contracts (and what exactly that means remains to be seen) for Chia Network's cryptocurrency. I helped develop the smart contracting language, which is a dialect of lisp called ChiaLisp that compiles to a VM language called clvm. I like to say "CLVM is what you'd get if lisp and assembly language had a baby".
I don't remember much math beyond undergraduate so a lot of Mathlib is over my head, but also not relevant to my goals (except maybe some elliptic curve stuff use for bls12_381 crypto). Much more of work has to do with computability and proving theorems about programs, both in clvm and in the clvm interpreter, which I've reimplemented in Lean 4.
I'm currently trying to produce a useful set of base lemmas and theorems about clvm that will be a useful basis for proving things about programs written in the language. It's been challenging and I get stuck a lot, mostly due to inexperience and lack of knowledge.
Patrick Massot (Mar 12 2024 at 20:44):
@Richard Kiss Welcome! Do you know about https://arxiv.org/abs/2109.14534 and related work?
Richard Kiss (Mar 12 2024 at 20:48):
I've heard of STARKNET but I wasn't aware they'd done anything like this, very interesting!
Patrick Massot (Mar 12 2024 at 21:35):
Jeremy Avigad can be found here but he is extremely busy this week.
Christian K (Mar 13 2024 at 16:47):
Hi, my Name is Christian Krause. I am a 16-year-old high school student in germany and I'm working together with another student to formalize the banach-tarski Paradox in Lean. I'm not that experienced with math, but I have some experience with "conventional" programming languages (python, C, etc.). The other student in our Team is more experienced with math, so we are currently working on the foundations for the Banach Tarski Paradox. Our Blueprint (and some other stuff) is at https://bergschaf.github.io/banach-tarski/. We are still at the beginning of the project, but we've made some progress, a first lemma is already in the mathlib.
Patrick Massot (Mar 14 2024 at 01:23):
I’m curious: did you create this blueprint using the new command line tool or by copy-pasting from another project?
Christian K (Mar 14 2024 at 09:41):
I created the blueprint before the command line Tool, so i took some "inspiration" from the pfr repo. But I also forked your blueprint repo because a recent commit caused some very weird colors, which made the Text in the bubbles unreadable.
Patrick Massot (Mar 14 2024 at 13:19):
You should open a GitHub issue if this color issue persists on master.
Christian K (Mar 14 2024 at 13:31):
Yeah, I'll try the new Version when I have time and check if the issue persists. And another thing, it dosen't look like the new Version has an option to specify a color for \mathlibok. Is there a reason for this or is this another thing that could be implemented?
Patrick Massot (Mar 14 2024 at 14:06):
It’s probably forgotten because people don’t use this command much. You can also open an issue about that.
Christian K (Mar 14 2024 at 14:08):
Ok, I think the command is quite useful, so I added a first PR to neutralize the command in print.tex so it dosen't create any errors (especially with CI)
Ewan Davies (Sep 29 2025 at 21:26):
Hi Lean community,
I'm a computer science professor starting to investigate Lean for formalizing proofs in the mathy part of computer science where I do research. I'm struggling with learning the Lean version of all the mathematical facts that I know, and will have lots of questions for you...
My long-term goal is to be able to prove any of my own results in Lean. Clearly, this is a long way off. For now, I'd be happy if I can prove all of the results in the undergraduate class that I teach which introduces computer scientists to mathematical proof. Here's one I'm a bit stuck on.
Prove for all integers that . I have the induction sketched out, but in the middle of the inductive step one needs the following lemma:
lemma simple_algebra (n : ℕ) (hn : 4 ≤ n) : 4 * n^4 ≥ (n + 1)^4 := by
calc 4*n^4
_ = n^4 + n^4 + n^4 + n^4 := by ring
_ ≥ n^4 + 4*n^3 + 6*n^2 + (4*n + 1) := by
gcongr
-- have hn : 4 ≤ n so a single nlinarith should sort out all these goals, no?
. -- goal is 4x^3 ≤ x^4
have : 4 * n^3 ≤ n * n^3 := by gcongr
nlinarith
. -- goal is 6x^2 ≤ x^4
have : 6 ≤ n^2 := by nlinarith
nlinarith
. -- goal is 4x+1 ≤ x^4
have : 4*n ≤ n^2 := by nlinarith
nlinarith
_ = (n + 1)^4 := by ring
I'm a bit disappointed that I can't complete the subgoals with nlinarith alone, and I have no idea what's going on when I use these tactics. So any insight into my proof would be appreciated.
(for context, it took me, Gemini and ChatGPT a while to find this proof by starting with a high-level human proof, translating it with lots of sorries, and iterating from there).
Lawrence Wu (llllvvuu) (Sep 29 2025 at 21:46):
This probably doesn't improve insight very much, but the following also works:
import Mathlib.Tactic.Ring.RingNF
lemma simple_algebra (n : ℕ) (hn : 4 ≤ n) : 4 * n^4 ≥ (n + 1)^4 := by
obtain ⟨n, rfl⟩ := Nat.exists_eq_add_of_le hn
ring_nf
grind -- `nlinarith` also works. So does `gcongr <;> norm_num`
Dan Velleman (Sep 29 2025 at 22:00):
Here's another approach: First show that for , . Therefore . Next, use the fact that to conclude that . Finally, cancel to get the conclusion you wanted.
Ewan Davies (Sep 29 2025 at 22:08):
Lawrence Wu (llllvvuu) said:
This probably doesn't improve insight very much, but the following also works:
import Mathlib.Tactic.Ring.RingNF lemma simple_algebra (n : ℕ) (hn : 4 ≤ n) : 4 * n^4 ≥ (n + 1)^4 := by obtain ⟨n, rfl⟩ := Nat.exists_eq_add_of_le hn ring_nf grind -- `nlinarith` also works. So does `gcongr <;> norm_num`
Indeed, my insight is unchanged. It's not surprising that my attempt at this as a total beginner is overly long, but how does one become "aligned" with the capabilities of lean/mathlib's tactics? It's definitely useful to see examples (how did I learn proof as a human after all?), so thank you, but at the rate I'm going I'll gain insight some time in 2030.
It seems a big magic that the pattern you match with obtain has the rfl tactic in it. There's no way I could know about that working, but perhaps I could have found this:
lemma simple_algebra2 (n : ℕ) (hn : 4 ≤ n) : 4 * n^4 ≥ (n + 1)^4 := by
-- hn ≡ ∃ k ∈ ℕ, n = 4 + k, so prove this with Nat.exists_eq_add_of_le
-- and intro this k and the hypothesis hnk : n = 4 + k
obtain ⟨k, hnk⟩ := Nat.exists_eq_add_of_le hn
rw [hnk] -- now rewrite our goal in terms of the unconstrained k ∈ ℕ
ring_nf -- and multiply out
nlinarith -- now nlinarith can solve it
Perhaps this is a pattern I can remember: if I'm doing Nat.le_induction then the algebra will be hard to persuade lean to do unless I use Nat.exists_eq_add_of_le to get back an "unconstrained" natural that it can do the algebra with.
Lawrence Wu (llllvvuu) (Sep 30 2025 at 01:14):
The rfl is not the rfl tactic, but a special pattern that the rcases tactic has, which is documented if you use rcases and hover over "rcases". I agree it's a bit cheeky and not very discoverable.
Patrick Massot (Sep 30 2025 at 06:52):
Ewan Davies said:
It seems a big magic that the pattern you match with
obtainhas therfltactic in it. There's no way I could know about that working, but perhaps I could have found this:
Actually there is a way: you could have read some tutorial about Lean instead of “it took me, Gemini and ChatGPT a while to find this proof”.
Patrick Massot (Sep 30 2025 at 06:53):
This trick of reading documentation is somehow forgotten those days, but it can do wonders.
Michael Rothgang (Sep 30 2025 at 07:05):
Strongly seconded: Mathematics in Lean (or many university Lean courses) are designed to teach you Lean step by step. There is a learning curve, which is why this is useful.
At the same time, learning Lean takes effort, even with reading documentation. So don't be discouraged and feel free to ask. If you're making an effort, people are very happy to help you.
Michael Rothgang (Sep 30 2025 at 07:06):
(Some people have in the past made very low-quality posts, sometimes to the level of literally pasting the response on zulip into the LLM and copy-pasting what it said... that's what it looked like, at least. Such low-effort posting is not so welcome here.)
Ewan Davies (Oct 02 2025 at 04:11):
Patrick Massot said:
Actually there is a way: you could have read some tutorial about Lean instead of “it took me, Gemini and ChatGPT a while to find this proof”.
OK, hostility noted. Perhaps I should have been clearer and wrote that there's no way I could know about that merely by reading code posted on zulip. I'm not some kind of low-effort troll and I do not appreciate your manner of communication. I'm actively learning Lean in the very manner you prescribe. But there's a reason people learn things from one another and not just by reading books. Someone with a job at a university ought to bear that in mind.
Niels Voss (Oct 02 2025 at 08:22):
Patrick Massot said:
Actually there is a way: you could have read some tutorial about Lean instead of “it took me, Gemini and ChatGPT a while to find this proof”.
Where exactly is this documented? I think it's only mentioned for one sentence on the documentation for rcases, which is somewhat obscure and hard to find. I only found out about it by reading others' Lean code; I think its not the type of thing you could expect to learn from a tutorial.
Patrick Massot (Oct 02 2025 at 08:48):
Ewan, I’m sorry that you felt attacked, this was not my intention. I certainly never thought you are a “low-effort troll” like we see sometimes here.
Patrick Massot (Oct 02 2025 at 08:48):
I admit this situation is frustrating to me because I spent a lot of time writing documentation and we see more and more people consciously deciding to not read it and instead rely only on chat bots.
Patrick Massot (Oct 02 2025 at 08:51):
Niels Voss said:
Patrick Massot said:
Actually there is a way: you could have read some tutorial about Lean instead of “it took me, Gemini and ChatGPT a while to find this proof”.
Where exactly is this documented? I think it's only mentioned for one sentence on the documentation for
rcases, which is somewhat obscure and hard to find. I only found out about it by reading others' Lean code; I think its not the type of thing you could expect to learn from a tutorial.
It is explained in https://leanprover-community.github.io/mathematics_in_lean/C03_Logic.html#the-existential-quantifier for instance.
Niels Voss (Oct 02 2025 at 08:53):
Oh nevermind then, sorry I didn't realize
Niels Voss (Oct 02 2025 at 09:06):
For your question about why nlinarith doesn't work, I don't use nlinarith a lot but I think that nlinarith has a hard time guessing when you want to derive a new inequality by multiplying an inequality on both sides. My (possibly incorrect) understanding is that nlinarith is not complete, so it won't come anywhere close to solving all nonlinear arithmetic problems. For the first goal, 4 * n^3 ≤ n * n^3, the missing fact seemed to be that 0 ≤ n^3 (which should be obvious given that all Nats are nonnegative). For the second and and third parts it's a little bit more complicated.
Niels Voss (Oct 02 2025 at 09:21):
Interestingly (and I don't necessarily recommend you do this in actual code), replacing n^4 with n*n*n*n seems to help at least a little bit:
theorem simple_algebra (n : ℕ) (hn : 4 ≤ n) : 4 * n^4 ≥ (n + 1)^4 := by
calc 4*n^4
_ = n^4 + n^4 + n^4 + n^4 := by ring
_ = n^4 + n*n*n*n + n*n*n*n + n*n*n*n := by grind
_ ≥ n^4 + 4*n*n*n + 6*n*n + (4*n + 1) := by
gcongr
· nlinarith
· -- goal is now 4 * n + 1 ≤ n * n * n * n
have : 4*n*n*n ≤ n*n*n*n := by gcongr
nlinarith
_ = (n + 1)^4 := by ring
Niels Voss (Oct 02 2025 at 09:27):
What is happening in your last branch is that if a and n are at least 4, then nlinarith is capable of proving that a + 1 ≤ a * n. But it isn't really capable of proving something like n^3 ≤ n^4. gcongr works by pattern matching exactly, so gcongr can prove something like 4*n*n*n ≤ n*n*n*n, for example, but not 4*n^3 ≤ n^4.
Ewan Davies (Oct 02 2025 at 21:00):
Patrick Massot said:
Ewan, I’m sorry that you felt attacked, this was not my intention. I certainly never thought you are a “low-effort troll” like we see sometimes here.
Thanks for the apology, and I apologise for being a bit sensitive. I will strive to be a helpful and respectful member of the community, and to read a lot of carefully written documentation.
Niels Voss said:
For your question about why
nlinarithdoesn't work, I don't usenlinaritha lot but I think thatnlinarithhas a hard time guessing when you want to derive a new inequality by multiplying an inequality on both sides. My (possibly incorrect) understanding is thatnlinarithis not complete, so it won't come anywhere close to solving all nonlinear arithmetic problems. For the first goal,4 * n^3 ≤ n * n^3, the missing fact seemed to be that0 ≤ n^3(which should be obvious given that allNats are nonnegative). For the second and and third parts it's a little bit more complicated.
I think my first encounter with linarith and nlinarith was made a bit difficult by not knowing their strengths and weaknesses. I couldn't tell if my overall approach was fine but the tactics are slightly brittle, or if due to intricate things like casting Nats to/from Integers (that I wasn't controlling carefully), the tactics were getting stuck in some more predictable way. The other software I pass inequalities to often is Wolfram Engine, and that is simply amazing at this kind of algebra/arithmetic. If only we could have their algorithms in Lean!
Alfredo Moreira-Rosa (Oct 04 2025 at 13:32):
Hello Ewan,
Just some words of encouragment here. Lean takes some time to grasp, and a lot of practice to master.
I want to also encourage you to forgo the path you prefer to learn Lean. You absolutely can't read all the documentation without practicing and understand how to use it.
I understand some doc writers frustration, but the reality is that the best way to learn for some people is to practice and search for documentation when you are blocked. That's the way i learn the most efficiently.
So if it's the way you started, continue, it's feasible, as that's the way i learned lean.
As for computer science proof, i encourage you to try with baby steps with pure program proofing. There is litle math and magical tactics involved in those.
It's mostly just using some key lemmas and less magical tactics, like split, 'ifte, 'simp, left, right, etc... mostly logic.
it will help you get good at some basic gymnastics and be more confident in proving some more complex computer science subjects.
I also link this cheat sheet, it may be of help :
https://leanprover-community.github.io/papers/lean-tactics.pdf
Michael Rothgang (Oct 05 2025 at 08:50):
Ewan Davies said:
Patrick Massot said:
Actually there is a way: you could have read some tutorial about Lean instead of “it took me, Gemini and ChatGPT a while to find this proof”.
OK, hostility noted. Perhaps I should have been clearer and wrote that there's no way I could know about that merely by reading code posted on zulip. I'm not some kind of low-effort troll and I do not appreciate your manner of communication. I'm actively learning Lean in the very manner you prescribe. But there's a reason people learn things from one another and not just by reading books. Someone with a job at a university ought to bear that in mind.
A bit late, but still: I presume I also contributed to your impression of hostility. I'm sorry for this. My message was not clear enough, that's on me.
I'm sorry for presuming you used AI without reading e.g. MIL. That was not fair. I will make sure to ask in the future. I do appreciate you're learning Lean and actively asking for help.
luna elliott (Oct 05 2025 at 09:42):
I'm still not very used to lean but I have gotten to the point that I have successfully proved an easy theorem at the start of one of my papers. In my mind it would be great if mathlib was growing with the optimistic goal of eventually absorbing most of the maths literature. If this is the goal then it makes sense for theorems like this (and I plan to do much more if I can) to be added to mathlib. However I don't really know what the current vision of mathlib is. There may also be standards for proofs being written efficiently which mine probably doesn't meet (there was a lot of guessing when I made it). There are probably also conventions on distribution of lemmas through files etc. I would like to start contributing but don't know where to go from here. Should I just make it as tidy as I can and make a PR? Or is there something else I should do?
Yaël Dillies (Oct 05 2025 at 11:03):
Hey! What's the theorem in question?
luna elliott (Oct 05 2025 at 11:47):
That every T1 semigroup topology on N^N contains the usual pointwise topology (https://arxiv.org/abs/1912.07029). I don't know anything about what people are looking for in terms off pull requests :sweat_smile:.
Aaron Liu (Oct 05 2025 at 11:47):
interesting
Aaron Liu (Oct 05 2025 at 11:48):
by "contains the usual topology" you mean it's finer than it?
luna elliott (Oct 05 2025 at 11:48):
exactly
luna elliott (Oct 05 2025 at 11:51):
In lean I gave N^N the standard topology and monoid structure and stated it like this (not sure if that was the best way to do it). Any advice for what I should do to get involved in mathlib?
theorem pointwise_is_minimal_T1 (M : Type)
[Monoid M] [TopologicalSpace M] [T1Space M] [ContinuousMul M] (φ : M ≃* (FullTransMonoid ℕ)) :
Continuous (φ.toFun)
Kenny Lau (Oct 05 2025 at 11:55):
that looks nice; a nitpick would be that i'm not sure a definition FullTransMonoid is necessary?
luna elliott (Oct 05 2025 at 11:59):
Thanks. Perhaps not. Perhaps I can just give \N \to \N the structure I need but then I'm not sure how a reader of the code would know what the algebraic structure I'm using is (I suppose they could probably figure it out). In general I don't know how the lean community usually handles the case that a set/type has multiple reasonable multiplicative structures which need to be handled separately. Or we want to prove facts about them individually.
Aaron Liu (Oct 05 2025 at 11:59):
luna elliott said:
theorem pointwise_is_minimal_T1 (M : Type) [Monoid M] [TopologicalSpace M] [T1Space M] [ContinuousMul M] (φ : M ≃* (FullTransMonoid ℕ)) : Continuous (φ.toFun)
by the way, docs#ContinuousMul means that uncurried multiplication is continuous, but you only seem to be assuming continuity of and in the paper
Aaron Liu (Oct 05 2025 at 12:00):
oh maybe I was looking at the wrong part of the paper this is fine
luna elliott (Oct 05 2025 at 12:01):
that's true! I'm still very much figuring out how to do things. If I had found the definitions for separate continuity in the documentation I would have only assumed that.
Rider Kirkpatrick (Oct 08 2025 at 12:21):
luna elliott said:
I'm still not very used to lean but I have gotten to the point that I have successfully proved an easy theorem at the start of one of my papers. In my mind it would be great if mathlib was growing with the optimistic goal of eventually absorbing most of the maths literature. If this is the goal then it makes sense for theorems like this (and I plan to do much more if I can) to be added to mathlib. However I don't really know what the current vision of mathlib is. There may also be standards for proofs being written efficiently which mine probably doesn't meet (there was a lot of guessing when I made it). There are probably also conventions on distribution of lemmas through files etc. I would like to start contributing but don't know where to go from here. Should I just make it as tidy as I can and make a PR? Or is there something else I should do?
As a fellow mathlib novice, I would like to recommend the following pages to you:
vision and goals of mathlib:
- https://mathlib-initiative.org/roadmap/
- https://mathlib-initiative.org/about/
- https://arxiv.org/abs/1910.09336 (this is older)
contribution and style guidelines:
- https://leanprover-community.github.io/contribute/index.html (also contains information on what kinds of contributions can be made)
- https://leanprover-community.github.io/contribute/naming.html
- https://leanprover-community.github.io/contribute/style.html
- https://leanprover-community.github.io/contribute/doc.html
- https://leanprover-community.github.io/contribute/commit.html
Last updated: Dec 20 2025 at 21:32 UTC