Zulip Chat Archive

Stream: new members

Topic: Kantorovich's theorem on Newton's method, fully formalized!

Victor Liu (Aug 07 2024 at 15:20):

Hi there! Today, I have managed to fully formalize the Newton-Kantorovich theorem, specifically the version with 1 constant based on this paper in Lean 4 so that I get no errors!

The full source code can be found at this repository under my GitHub account.

I have decided that it is not very useful to contribute this theorem to Mathlib for now since to my knowledge, there aren't really any other theorems that depend on this one, even less this version which is weaker than the one found on Wikipedia. Maybe if Mathlib++ ever gets created, then it could be added there, although it would take significants amount of refactoring because it is quite long/verbose and does NOT follow the conventions at all for the sake of development speed.

This was an interesting journey, stemming from mostly 2 parts.

Finding the Newton-Kantorovich theorem
When I learnt about Newton's method in Calculus. I was looking for numerical ways of finding zeroes of functions while studying some physics-related optimization problems, and Newton's method was one of them (I won't delve in the details too much as it would take way long to trace out :sweat_smile:).
Anyways, it was interesting and also frustrating to see that most easily accessible resources would say that if you start "close enough" to a root of a function, then Newton's method would converge to a zero, but "not always" if it is not "well-behaved". Of course, this was quite annoying because you would already need to have a good idea of the where the root is to be able to converge to it, and even at that, you couldn't be sure if you were actually "close enough" or you would be unlucky enough for it to fly off to somewhere else.
Naturally, I wanted to be able to quantify the amount "close enough". I looked around quite a lot, and it was quite hard to find answers to my question. Recently though, I prodded some large language models and discovered the Newton-Kantorovich theorem, which gave sufficient initial conditions to ensure convergence to a root within an interval. Finally, there exist some answers to my question!
However, I am not well-versed in advanced mathematics. My knowledge is pretty much limited to what is available on Khan Academy, where I could find math courses for free. So, when I first read the statement on the theorem on the Wikipedia page, I didn't know what most of the conditions meant. To my sorrow, I then learnt that in order to ensure convergence, you needed to compute a Lipschitz bound on the derivative of your target function over an interval, which turns out to be even harder than finding zeroes most of the time. :melting_face:
So, I put this aside, and decided that I would just fool around with other optimization techniques like gradient descent that I discovered in machine learning.
Discovering Lean
The recent explosion of developments in artificial intelligence and machine learning is truly incredibly fascinating, especially in the field of natural language processing and large language models. One specific moment that made the rounds in the mathematics community is the release of AlphaGeometry. This was quite a big moment which surprised many people who did not realize the pace of progress, including myself! When I looked at the paper, I found this sentence:

Geometry stands out among other olympiad domains because it has very few proof examples in general-purpose mathematical languages such as Lean owing to translation difficulties unique to geometry

Mathematical language? What's that? I though to myself. Well, it turns out that there has been a lot of study for automated theorem proving in formal languages, including lots of research with AI. This made me extremely curious, and I sought out to find more! I found that OpenAI, the creators of the famous ChatGPT, had published a paper in 2022 that used LLMs to solve formal Olympiad problems, using the one and only Lean (version 3 since version 4 was not out yet).
I often looked around for similar work, and eventually stumbled upon DeepSeekMath, by DeepSeek, a Chinese AI company that achieve incredible results with LLMs! In this paper, they created a 7 billion parameter model that was able to achieve comparable performance to GPT-4, which is rumored to have 1.8 trillion parameters, which is over 250x larger! Interestingly, this model went on to be used by almost all top achievers in the Artificial Intelligence Mathematical Olympiad AIMO.
This company had another paper in particular, called DeepSeekProver, where they used their DeepSeekMath 7B model to generate Lean 4 code and self-improve. I thought this was extremely interesting, and wanted to learn more about Lean 4 myself.

So, with all this in mind, I decided that my goal for the summer break would be to formalize the 1 constant version Newton-Kantorovich theorem. My goal was to take half of summer, around 1 month, from start to finish.
Overall, I exceeded this time limit, but not by much. It was mostly for the research part on the Newton-Kantorovich theorem, since I had to learn a lot of concepts about more advanced math like abstract algebra (I had no clue what a Banach space was, so I couldn't even read the first sentence of the theorem in the paper).
Nevertheless, I'm happy that the formalization in Lean 4 itself took less than a month, from the initial creation of the repository on July 11 until August 7. My progression in learning the Lean syntax was mostly as follows:

The natural number game for Lean 4
The mathematics in Lean handbook, with the following order: chapters 1, 2, 3.1/3.2, 10, 11, 3, 4, 9
But, of course, I learnt a lot on the spot, using this forum in particular. I would like to thank everyone that has helped me solve all my unending type errors and for patiently dealing with my questions. In addition, large language models like GPT-4o and Claude 3.5 Sonnet played a major role in helping me break down basic syntax like the pattern matching brackets, which initially confused me a lot.

This has been a very fun experience, and I am grateful to be able to have an amazing tool like Lean 4 at my disposal, for free, to play around with. Thanks to all the maintainers, the contributors of Mathlib, and of course, the community in general. :heart:

Last updated: May 02 2025 at 03:31 UTC