Zulip Chat Archive

Stream: CSLib

Topic: Bachelor End Project

Daan Deckers (Jan 31 2026 at 14:24):

Hello everyone,
We are @Marian Luca , @Jēkabs Gritāns , @Momchil Damianov and @Daan Deckers , and we are bachelor students doing our bachelor end project on CSlib under guidance of @Pieter Cuijpers . We hope to make meaningful contributions to CSlib and we look forward to working with you all! Are there some areas/disciplines of CS that you recommend we contribute to?

Kim Morrison (Feb 01 2026 at 06:47):

What are you interested in? What have you learnt recently? What experience do you have with Lean/Cslib? You have to help us out here. :-)

Jēkabs Gritāns (Feb 01 2026 at 13:22):

The theoretical CS course which we've all taken followed chapters 1-4 of Sipser's book (regular languages, context-free languages, Turing machines, decidability).

Personally, I'm most comfortable with regular languages / finite automata, as they've appeared in other courses, and in a research project I did on recurrent neural networks. So I feel like contributing to "Cslib/Computability/" would have the highest chances of success. But I am of course open to other topics :)

Personally, I have zero experience with lean/mathlib/cslib, but we did have to do basic analysis problems in Waterproof (which is a natural language wrapper on top of COQ). I'm also currently reading "Theorem Proving in Lean 4" and "Functional Programming in Lean" and trying to learn the basics of type theory.

Shreyas Srinivas (Feb 01 2026 at 14:22):

Based on your advisor’s interests and your desire to contribute to CSLib, maybe linear recurrence sequences are your thing? Linear recurrences are the discrete counterparts of linear differential equations (which are central to control theory and cyberphysical systems). But they also show up in combinatorics a lot. There are open CS problems about them such as Skolems problem.

The theory gets fairly abstract and connects to abstract algebra as well as number theory, so maybe mathlib people will also be interested.

Shreyas Srinivas (Feb 01 2026 at 14:26):

This would be a substantial amount of work and could proceed independently of other parts of CSLib for a few months.

Shreyas Srinivas (Feb 01 2026 at 14:29):

By Skolems problem, I mean the problem of asking if there can exist an algorithm to decide whether an LRS of order 5 or higher has a zero (and by extension a number of other properties) or not.

Shreyas Srinivas (Feb 01 2026 at 14:30):

And if there is enough material in your library, you could maybe post this problem as a conjecture to deepmind’s formal conjectures repository

Ching-Tsun Chou (Feb 01 2026 at 21:46):

Jēkabs Gritāns said:

Personally, I'm most comfortable with regular languages / finite automata, as they've appeared in other courses, and in a research project I did on recurrent neural networks. So I feel like contributing to "Cslib/Computability/" would have the highest chances of success. But I am of course open to other topics :)

Quite a bit of automata theory, over both finite words and infinite words, already exists in cslib. In particular, the closure results of regular languages under boolean operations, concatenation, and Kleene star are in cslib:
https://github.com/leanprover/cslib/blob/main/Cslib/Computability/Languages/RegularLanguage.lean
One missing piece that no one (as far as I know) is working on is the connection between regular expressions (RE) and regular languages. The definition of RE exists in mathlib but is not connected to regular languages. (Mathlib did not have the closure results for boolean operations until recently and the closures under concatenation and Kleene star are still missing.). So the connection between RE and RL could be a project idea.

Chris Henson (Feb 01 2026 at 22:02):

There are additional properties to be proven of the System F formalization that don't require too many additional definitions. (I can go into more detail if this is an area of interest)

Ching-Tsun Chou (Feb 02 2026 at 03:37):

Formalizing Brzozowski's algorithm for computing the minimal DFA of a regular language is yet another project idea:
https://en.wikipedia.org/wiki/DFA_minimization#Brzozowski's_algorithm

Maximiliano Onofre Martínez (Feb 02 2026 at 04:24):

Hi! Jumping in here as I'm also a bachelor student looking for a project.

I'm interested in writing a type checker for Stlc and proving soundness and completeness. @Chris Henson Would you encourage this, or would you recommend looking into the System F properties instead? I'd be happy to hear the details.

Chris Henson (Feb 02 2026 at 05:44):

Do you have a particular kind of typechecker and reference(s) in mind?

For System F, I was thinking of properties like confluence, strong normalization, and parametricity. For simplicity, you could strip the existing Fsub of extensions down to just ordinary System F or Fω.

Maximiliano Onofre Martínez (Feb 02 2026 at 10:42):

Pierce's TAPL is my main reference for the idea, but find your suggestion about System F very interesting, thanks! I will definitely take a look.

Chris Henson (Feb 02 2026 at 10:55):

I think it's not unreasonable to look at the metatheory of a type checker, but you would need some references beyond TAPL and might find more subtlety then you expect.

For System F, I'll point out a couple of Rocq formalizations you might find helpful as references:

Maximiliano Onofre Martínez (Feb 02 2026 at 11:34):

Thanks for the links. Your blog post was also a great read. I hope to have some updates soon!

Momchil Damianov (Feb 08 2026 at 14:17):

Shreyas Srinivas said:

This would be a substantial amount of work and could proceed independently of other parts of CSLib for a few months.

Are there any resources you would recommend to get started with? I tried looking through CSLib & Mathlib on GitHub to hopefully find some issues (regardless of their state) that would give me some idea of what is currently implemented/being developed with respect to LRSs, but I couldn't find anything at all mentioning them.

Shreyas Srinivas (Feb 08 2026 at 15:18):

There’s nothing at all right now. This is one of the topics where a few different formalisation projects will be necessary before we know what works and what doesn’t.

Shreyas Srinivas (Feb 08 2026 at 15:20):

And that’s another thing. I suggest building a formalisation independently first without the extra pressure of mathlib or CSLib readiness. Formalisation is hard enough when building new definitions are involved.

Momchil Damianov (Feb 08 2026 at 15:51):

Okay, thank you for clearing things up!

Pieter Cuijpers (Feb 10 2026 at 08:54):

Shreyas Srinivas said:

And that’s another thing. I suggest building a formalisation independently first without the extra pressure of mathlib or CSLib readiness. Formalisation is hard enough when building new definitions are involved.

That's a fair point. But how do we go about that in such a way that different people can voice their opinions? First make an entire proof and then put that up for review or comments here? (It would be nice to know from the start there are people interested in reviewing.)

Shreyas Srinivas (Feb 10 2026 at 10:11):

That’s hard to say. Usually people here voluntarily help when they can. But they might also not have time. The reason I am suggesting not to aim for CSLib on day 1 is precisely because the project will very quickly get stalled or steered by reviewers and eliminate the independent input of the students

Shreyas Srinivas (Feb 10 2026 at 10:12):

And ideally more experimentation in the design can be accomplished without CSLib review and more of the students’ ideas can be attempted without being overruled by us in the community

Shreyas Srinivas (Feb 10 2026 at 10:13):

Since you are in Netherlands, it might be simpler to get a second supervisor or even a third one rather than relying on the community to review an ongoing project.

Shreyas Srinivas (Feb 10 2026 at 10:14):

We might not even understand the goals and constraints of the thesis here

Shreyas Srinivas (Feb 10 2026 at 10:15):

Finally once the students are done they can always submit to CPP later this year and get peer reviewed.

Chris Henson (Feb 10 2026 at 10:58):

@Pieter Cuijpers I think that some of the above is a bit extreme, but I agree with some broad points.

Anyone is welcome to make an draft PR to solicit early feedback. How practical it is to move from this point to merged PR is going to depend a lot on the level of experience of the student and if the material they chose easily finds a reviewer.

The areas above I suggested for lambda calculus for instance I would be very happy to review. In an area like this where I'm very comfortable with the material and formalization, I am happy to spend some extra time helping clean things up for a PR. Other areas like linear recurrences mentioned above would require some extra effort on my part but are still possible. I would say this all should be considered on a case by case basis.

The official maintainer team is very small at this point. Fabrizio and myself (a PhD student) are reviewing each PR at this point, with additional input from trusted contributors and a couple of Mathlib maintainers. This will expand in the future, but just to give some immediate context.

Shreyas Srinivas (Feb 10 2026 at 11:42):

Pieter Cuijpers said:

Shreyas Srinivas said:

And that’s another thing. I suggest building a formalisation independently first without the extra pressure of mathlib or CSLib readiness. Formalisation is hard enough when building new definitions are involved.

That's a fair point. But how do we go about that in such a way that different people can voice their opinions? First make an entire proof and then put that up for review or comments here? (It would be nice to know from the start there are people interested in reviewing.)

One thing you could do is invite people to your GitHub repository to review from time to time or just post here. Depending on availability we might very well be happy to give it a pass.

Chris Henson (Feb 10 2026 at 11:44):

Yeah, I think we are converging on the same policy as Mathlib that anyone is welcome to leave reviews. Some frequent contributors who aren't officially maintainers have started to do so naturally.

Shreyas Srinivas (Feb 10 2026 at 11:47):

Fwiw, I should probably tell you that this is a topic (LRS) I have been suggesting for a while. So I can’t guarantee that nobody else has taken it up. There is a reasonable chance that I might attempt an independent formalisation of LRSes around April or May. I can see value in multiple formalisation attempts since they can inform what we put in CSlib and we might find that different styles of formalisation prove useful for different goals.

Momchil Damianov (Feb 12 2026 at 09:11):

Shreyas Srinivas said:

There’s nothing at all right now. This is one of the topics where a few different formalisation projects will be necessary before we know what works and what doesn’t.

I just came across a definition for a "Linear Recurrence" that already exists within Mathlib's Algebra module (Mathlib/Algebra/LinearRecurrence.lean). Is that a different structure from what you are referring to?

Marian Luca (Feb 12 2026 at 10:46):

Chris Henson said:

Do you have a particular kind of typechecker and reference(s) in mind?

For System F, I was thinking of properties like confluence, strong normalization, and parametricity. For simplicity, you could strip the existing Fsub of extensions down to just ordinary System F or Fω.

I would be interested in the suggestions regarding formalizing confluence, strong normalization, and parametricity for System F, if they are still available, @Chris Henson , since it seems that @Maximiliano Onofre Martínez is also interested in it. Do you think it would be better to find something else, so that there wouldn't be multiple people working on the same thing?

The suggestion of @Ching-Tsun Chou about Brzozowski's algorithm seems very interesting as well. Correct me if I am wrong, but it seems Hopcroft's algorithm also isn't formalized.

Chris Henson (Feb 12 2026 at 10:56):

I don't know the scope required for your project, but you can certainly split the work among you for the System F suggestions. What I mentioned are pretty separate things to formalize.

Shreyas Srinivas (Feb 12 2026 at 11:09):

Momchil Damianov said:

Shreyas Srinivas said:

There’s nothing at all right now. This is one of the topics where a few different formalisation projects will be necessary before we know what works and what doesn’t.

I just came across a definition for a "Linear Recurrence" that already exists within Mathlib's Algebra module (Mathlib/Algebra/LinearRecurrence.lean). Is that a different structure from what you are referring to?

It’s a good starting point, but as you can see the file is quite basic. So there’s a lot of work to do to build it up into something usable.

Ching-Tsun Chou (Feb 12 2026 at 17:26):

Marian Luca said:

The suggestion of Ching-Tsun Chou about Brzozowski's algorithm seems very interesting as well. Correct me if I am wrong, but it seems Hopcroft's algorithm also isn't formalized.

Not sure what you mean by "Hopcroft's algorithm". If you mean this:
https://en.wikipedia.org/wiki/Thompson%27s_construction
then you are correct: it has not been formalized. Currently in cslib, the closures of regular languages under boolean operations are proved using DA and the closures under concatenation and Kleene star are proved using NA (in fact, as special cases of the corresponding constructions for ω-regular languages). There has been almost no development in EpsilonNA, where the only nontrivial theorem proved is that an εNA can be converted to an NA.

Marian Luca (Feb 12 2026 at 18:12):

Ching-Tsun Chou said:

Marian Luca said:

The suggestion of Ching-Tsun Chou about Brzozowski's algorithm seems very interesting as well. Correct me if I am wrong, but it seems Hopcroft's algorithm also isn't formalized.

Not sure what you mean by "Hopcroft's algorithm". If you mean this:

I was referring to this:
https://en.wikipedia.org/wiki/DFA_minimization#Hopcroft's_algorithm
This is actually the algorithm for DFA minimization I learned during my Automata & Formal Languages course, as well as what I came across in some compiler construction books.

Ching-Tsun Chou (Feb 12 2026 at 18:39):

It's not formalized. There is no result about DFA minimization in cslib or (as far as I know) in mathlib. There is a "Myhill-Nerode theorem" in mathlib:
https://leanprover-community.github.io/mathlib4_docs/Mathlib/Computability/MyhillNerode.html
But its connection to the minimal DFA:
https://en.wikipedia.org/wiki/Myhill%E2%80%93Nerode_theorem
is not shown. The notion of "right congruence" is defined in PR cslib#278 which should be merged soon (hopefully).

Marian Luca (Feb 12 2026 at 18:48):

I see, thanks a lot for your help!

Last updated: Feb 28 2026 at 14:05 UTC