## Stream: general

### Topic: BSc Final Project

#### Tomaz Gomes Mascarenhas (Dec 22 2020 at 00:31):

Hi all! I'm close to finish my BSc in Computer Science and, as my final project, I'm planning to formalize some data structure using Lean. If possible, I would like to formalize one that wasn't formalized yet on mathlib and will be of some relevance to it. One possibility is to choose one from Benjamin's Pierce course on formalization of Algorithms (https://softwarefoundations.cis.upenn.edu/vfa-current/toc.html) and adapt it to Lean. What do you think? Any suggestions?

#### Simon Hudon (Dec 22 2020 at 00:57):

I suggest you start with the easiest ones to see what tools Lean might be missing. You may have to write programming-specific tactics to help yourself complete your tasks. Having a slow buildup in difficulty should also help you choose the problem to focus on.

#### Kevin Lacker (Dec 22 2020 at 16:41):

there's some stuff for insertion sort and merge sort in data/list/sort.lean, so that seems like a good place to look

#### Kevin Lacker (Dec 22 2020 at 16:41):

it doesn't seem very fleshed out, in the sense that there are many other ways to sort arrays that are not mentioned

#### Kevin Lacker (Dec 22 2020 at 16:42):

also, there are kind of two main facts about sorting arrays - the fact that algorithms actually do sort arrays, and the asymptotic analysis of how much time it takes

#### Kevin Lacker (Dec 22 2020 at 16:42):

nothing currently in the sort logic mentions the time complexity

#### Kevin Lacker (Dec 22 2020 at 16:42):

I'm not sure if anything anywhere in Lean is mentioning the time complexity of algorithms? but I think something along these lines would be interesting

#### Kevin Lacker (Dec 22 2020 at 16:43):

my instinct would be to focus on sorting rather than red-black trees and priority queues, because sorting is clearly incomplete and it'll be easier to do than the more complex data structures

#### Mario Carneiro (Dec 22 2020 at 19:12):

Kevin Lacker said:

it doesn't seem very fleshed out, in the sense that there are many other ways to sort arrays that are not mentioned

I don't think the current implementations are intended to show off all possible sort algorithms; there is just one simple one to show that it is possible, and one efficient one in case you just need to sort stuff

#### Mario Carneiro (Dec 22 2020 at 19:13):

Kevin Lacker said:

nothing currently in the sort logic mentions the time complexity

It's not really possible to formalize time complexity of (shallowly embedded) lean algorithms. You need an actual theory of computation for that, e.g. computability/ and even then you probably won't get something practically usable

#### Kevin Lacker (Dec 22 2020 at 19:54):

for sorts specifically I was thinking you could implement them in terms of a thing that counts the number of comparisons? i dunno if it would be practically usable, but maybe within scope for an undergrad project

#### Kevin Lacker (Dec 22 2020 at 19:54):

I only refer to this as "not fleshed out" in the sense of "still some stuff an undergrad could do" rather than "this library is insufficient for some practical reason"

#### Mario Carneiro (Dec 22 2020 at 19:56):

It's certainly possible to have a lightweight formalization of time or comparison complexity enough to prove the asymptotic orders of some sorting algorithms, and that could indeed be an undergrad project

#### Eric Wieser (Dec 22 2020 at 20:07):

I'm not sure how you'd go about counting comparisons or similar in a lean function without adding substantial bookkeeping

#### Kevin Lacker (Dec 22 2020 at 20:14):

the sort algorithms are just so simple, I figure you would be able to add the substantial bookkeeping

#### Mario Carneiro (Dec 22 2020 at 20:22):

The usual approach is to use a state monad that counts whatever you want to count

#### Simon Hudon (Dec 22 2020 at 21:09):

You can also consider formalizing the operational semantics of a small programming language with if, rec, compare let and array operations and counting the number of evaluation steps for arbitrary inputs

#### Tomaz Gomes Mascarenhas (Jan 17 2021 at 23:28):

Thank you all! I decided to make a lightweight formalization of the time complexity of the sorting algorithms from mathlib. I just have one doubt, is it possible to add this bookkeeping to the existing insertion sort and merge sort, or the idea is to write new functions that use the state monad to count comparisons?

#### Johan Commelin (Jan 18 2021 at 05:02):

@Tomaz Gomes Mascarenhas I don't think you can add it to the existing functions, because you would need to change their type to somehow make room for the bookkeeping... but we don't want their types to change.

#### Alex J. Best (Jan 18 2021 at 05:20):

I wondered before if using typeclasses one could do this, if you have an algorithm for rings that you want to measure in terms of ring operations for instance, you can make a dummy type with has_add, has_neg, has_mul, has_one etc that are defined to count the number of basic operations needed to form that expression.
Then calling the same function for these dummy types would count the total number of operations.
But I couldn't get it to work satisfactorily.

#### Alex J. Best (Jan 18 2021 at 05:22):

The example I tried to do was karatsuba matrix multiplication but somehow getting the right number of operations out of the final matrix was hard. The nature of my setup double counted terms appearing twice, where as an actual implementation wouldn't recompute the term :confused:

#### Johan Commelin (Jan 18 2021 at 05:31):

otherwise it sounds like a nifty trick (-;

#### Eric Wieser (Jan 18 2021 at 09:00):

Presumably you can count ring operations by quotienting R \times count by just the first variable, then use quotient.out to get the count?

#### Eric Wieser (Jan 18 2021 at 09:01):

That doesn't work for counting operations that don't directly map into the result though, as your example of a sort needs

#### Joe Hendrix (Jan 18 2021 at 17:52):

I think for a thesis on this, I'd just define a simple imperative language in Lean with arrays, write the operational semantics, define a cost model, and then start proving properties about it.
For properly proving properties about the operational semantics of a Lean program, you need an interpreter or compiler, and that's a lot of work to sign up for up front. Once you have the basic imperative language, you instead have a library you could extend in a bunch of directions (e.g., get closer to a real machine semantics, more accurate cost models, automation).
This assumes you are interested primarily in the complexity analysis. If you are more interested in the foundations of Lean, this would take you afield from that.

Last updated: May 14 2021 at 22:15 UTC