Zulip Chat Archive

Stream: new members

Topic: First contribution


Patrick Johnson (Dec 18 2021 at 04:50):

About a month ago, I started learning Lean and joined this Zulip chat, but I never actually introduced myself. I'm an undergrad IT student from eastern Europe. I was using Isabelle/HOL in the past and decided to move to Lean because dependent types are definitely more expressive than simple types. I am looking for some simple lemma to start my first contribution to mathlib!

I skimmed through https://leanprover-community.github.io/undergrad_todo.html, but I'm not too much experienced in any of those. I am familiar with undergrad math (except topology and vector spaces), however I'm not on that level that I could easily prove complex theorems in advanced math fields. Instead, my study program is particularly focused on algorithms and data structures. I know a wide range of algorithms, data structures, their complexities and properties.

So I want to ask for an advice on what lemmas about algorithms and data structures are currently missing in mathlib. I have some ideas, but I don't want to reinvent the wheel if what I'm thinking about already exists, but I'm not aware of it.

Eric Wieser (Dec 18 2021 at 05:42):

FWIW, https://github.com/leanprover-community/mathlib/wiki/Undergrad-TODO-trivial-targets is a more accessible version of that yaml file, which points out where the easiest bits are.

Johan Commelin (Dec 18 2021 at 06:37):

@Patrick Johnson Welcome! I agree that the list that Eric posted is a very good way to get started if you want to add some maths.

Johan Commelin (Dec 18 2021 at 06:40):

There hasn't been much focus on verified algorithms in mathlib so far. So the best thing to do would be to pick something you like, and ask if it's already done on this chat. Most likely it won't be done yet.

Johan Commelin (Dec 18 2021 at 06:40):

Basically, we only have a verified quicksort algorithm :see_no_evil: (I'm exaggerating a little bit.)

Mario Carneiro (Dec 18 2021 at 06:48):

I've started working on some verified algorithms in mathlib4, like mathlib4#136 . For various reasons it didn't seem to be as useful in lean 3

Kevin Buzzard (Dec 18 2021 at 09:14):

I guess we've been concentrating on the maths which gets taught in maths departments, and in maths departments all this algorithm analysis is thought of as a bit niche (because a lot of modern mathematics doesn't focus on algorithms). In algebraic number theory there are people who study the computational side of things ("this factoring algorithm is quicker than this one") but that's only one small area and the analysis is usually not at all deep

Yaël Dillies (Dec 18 2021 at 09:37):

If you're interested in automata, my branch branch#regex_nfa needs some love. It constructs a NFA from a regular expression. I haven't yet had time to clean it up, so feel free to take over.

Yaël Dillies (Dec 18 2021 at 09:38):

In combinatorics, we have a lot of these bounds, but as Kevin said the analysis is usually not that deep.

Yaël Dillies (Dec 18 2021 at 09:40):

However, #10645 does aim to verify an algorithm. The algorithm in questions calculates small Roth numbers.

Patrick Johnson (Dec 18 2021 at 10:15):

Thanks for the suggestions. As I can see, Lean 3 leans towards abstract maths rather than actual algorithms, which makes perfect sense for a theorem prover. Is the interest going to shift expand towards practical algorithms in Lean 4? I can imagine big corporations using Lean 4 as a tool for verifying the correctness of their complex systems. Currently, there is very little tools for that in mathlib.

Regarding the Trivial TODO list, it's nice, but I don't see much of a point in stealing such a good starting lemmas from someone who studies maths and who would enjoy proving those lemmas much more than me.

For the algorithms, I was thinking about implementing and verifying: all kinds of sorting algorithms and comparing their performances, binary tree search algorithms, AVL trees, RedBLack trees, Bellman-Held-Karp and TSP, Dantzig and Gomory's cutting plane algorithm, SAT algorithms (DPLL, CDCL, Cube-and-Conquer, brute-force, ...), Dijkstra's graph algorithms, dynamic programming, A* graph search, Nelder-Mead simplex, serializing and deserializing common data structures, linked list operations, etc.

I was also thinking about formalizing some interesting, but probably not very useful theorems, such as: cellular automata (Conway's GoL, Langton's ant, Rule 110, ...), Turing-completeness of various obscure computational models, complexity class of some games (P-space completeness of Sokoban, for example). I also recently read a paper with the solution to the Conway's Angel problem, and I think I understand their solution enough to be able to formalize the proof. But I believe none of these is much of a benefit to mathlib.

So, what do you think? Is any of these ideas worth effort to become a part of mathlib? Or should I wait for Lean 4 to be finished and see what the new bias regarding verified algorithms will be?

Johan Commelin (Dec 18 2021 at 10:18):

Patrick Johnson said:

Is the interest going to shift towards practical algorithms in Lean 4?

I hope it will expand, yes. But not shift, please :wink:

Johan Commelin (Dec 18 2021 at 10:20):

Things like RB trees have a native implementation in Lean 3. Which means they are fast, and implemented in C++, but you can't prove anything about them. There is also a Lean 3 implementation in mathlib, and you can prove things about it, but you can't use them in practice because they are slow.

Yaël Dillies (Dec 18 2021 at 10:20):

All of this sounds very cool and you're very welcome to prove "useless" stuff. That's what the archive. directory is for. Note that we already have docs#rbtree and you might also be interested in docs#ord_node.

Mario Carneiro (Dec 18 2021 at 10:22):

When it comes to formalizing algorithms, you often have to decide whether your interest is theoretical or practical. Are you interested in actually having code that runs efficiently and is used in actual applications, or are you interested in proving correctness and/or runtime bounds? The two are almost entirely orthogonal and neither is trivial

Mario Carneiro (Dec 18 2021 at 10:27):

For mathlib4, I'm focusing on practical algorithms, which biases toward simple data structures like red-black trees, dynamic arrays, hash maps, priority queues; common reusable algorithms like Dijkstra, string searching and regex compilation (applied to real unicode strings, not an abstract alphabet); and algorithms specific to theorem proving like Knuth-Bendix completion, unification, SAT algorithms, congruence closure, and quantifier elimination.

Mario Carneiro (Dec 18 2021 at 10:29):

One interesting project which is fairly self contained is to build a high quality sort algorithm. Lean has never had anything more complicated than quicksort and mergesort

Eric Taucher (Dec 18 2021 at 11:02):

Mario Carneiro said:

For mathlib4, I'm focusing on practical algorithms,..., unification, ...

Any particular unification? I myself have an interest in Syntactic unification of first-order terms

As this is moving off topic, feel free to move to a new topic.

Marc Huisinga (Dec 18 2021 at 12:12):

Mario Carneiro said:

One interesting project which is fairly self contained is to build a high quality sort algorithm. Lean has never had anything more complicated than quicksort and mergesort

TimSort? :grinning_face_with_smiling_eyes:


Last updated: Dec 20 2023 at 11:08 UTC