Zulip Chat Archive
Stream: general
Topic: finite automata and context-free grammars
Stefan Hetzl (Sep 25 2024 at 09:41):
An undergraduate student of mine wants to do a project in Lean and I am looking for a suitable topic. I am planning to formalise some basic results in formal language theory with him. I am currently trying to get an overview of what has already been done and what is currently being done in Lean.
I am aware of what is in MathLib/Computability and also of the pull requests #11311, #15649, #15651, #15654, #16311, #13514, #15895. I am in contact with @Martin Dvořák who has done a lot of work on this topic. Is there anything else to take into consideration?
Currently I am thinking that the pumping lemma for (linear and) context-free languages might be a good topic. It would require the formalisation of derivation trees (which apparently also has not yet been done in Lean), possibly Chomsky normal form and, of course, the actual proof of the pumping lemma.
Are there any opinions on that? Am I missing something?
Lukas Gerlach (Sep 25 2024 at 10:19):
We also have a similar project at TU Dresden advancing mainly through student projects:
https://github.com/knowsys/Formale-Systeme-in-LEAN
One of our students recently formalized a few aspects of context free grammars. This is still work in progress and a lot of results are open. Before that, some work on regular languages has been done. Most prominently establishing equivalence of regular grammars and finite automata.
It's probably worthy to note that we try to stay close to our lecture "Formale Systeme" and deviate quite a bit from the formalizations already present in mathlib.
Maybe this is still interesting for you; feel free to reach out in a DM :)
Martin Dvořák (Sep 25 2024 at 10:30):
FYI, I have a student who is just starting to work on the context-free pumping lemma.
We should perhaps coordinate our efforts in order not to do the same work twice.
I am on vacation this week, but next week I will be free almost any time to discuss and organize the work.
Lukas Gerlach (Sep 25 2024 at 11:52):
The last student project we had here was just finished so currently there is no one actively working on this. We would be on the lookout for students continuing the work in the next semester though :)
Next steps for us would be to fill the current gaps and to start thinking about formalizing push down automata.
I'll be on vacation in the upcoming weeks but maybe a colleague of mine @Stephan Mennicke would be up for meeting with you (if you want to involve us).
It makes sense for us to not just build our own disconnected formalization but to start thinking more about how we can come closer to the ongoing efforts in mathlib :)
Stefan Hetzl (Sep 30 2024 at 09:15):
Yes, sure, we should coordinate.
Ok. Then I am not going to touch the pumping lemma. We will then look at the formalisation of pushdown automata instead. Hopefully we can get to the point of proving that PDAs are equivalent to CFGs.
Martin Dvořák (Sep 30 2024 at 09:25):
Can you get a PDA as a special case of docs#Turing.TM2.Cfg maybe?
Lukas Gerlach (Sep 30 2024 at 09:38):
Martin Dvořák said:
Can you get a PDA as a special case of docs#Turing.TM2.Cfg maybe?
This is likely possible (we had been thinking about this) but we are unsure how convenient this would be.
Martin Dvořák (Sep 30 2024 at 09:40):
I would like @Mario Carneiro to weigh in on PDA vs TM2.
Lukas Gerlach (Sep 30 2024 at 09:41):
On our side, we could focus on getting our work around regular grammars into mathlib and showing equivalence to the existing DFA definition (which requires quite a few adjustments in our code to make it compatible). Let's see if we can find a student who wants to do this in the upcoming semester :)
Feel free to let us know if you think this is useful! So far, I did not see a PR or issue going into this direction.
Stefan Hetzl (Sep 30 2024 at 09:45):
Martin Dvořák schrieb:
Can you get a PDA as a special case of docs#Turing.TM2.Cfg maybe?
We will try that. Thanks for the pointer!
Stefan Hetzl (Sep 30 2024 at 09:47):
Lukas Gerlach schrieb:
On our side, we could focus on getting our work around regular grammars into mathlib and showing equivalence to the existing DFA definition (which requires quite a few adjustments in our code to make it compatible). Let's see if we can find a student who wants to do this in the upcoming semester :)
Sounds good! IIRC the relationship between regular grammars and NFAs is very straightforward. I believe mathlib already contains the equivalence between NFAs and DFAs. Would it be a possible route for you to go via NFAs?
Lukas Gerlach (Sep 30 2024 at 09:50):
Stefan Hetzl said:
Lukas Gerlach schrieb:
On our side, we could focus on getting our work around regular grammars into mathlib and showing equivalence to the existing DFA definition (which requires quite a few adjustments in our code to make it compatible). Let's see if we can find a student who wants to do this in the upcoming semester :)
Sounds good! IIRC the relationship between regular grammars and NFAs is very straightforward. I believe mathlib already contains the equivalence between NFAs and DFAs. Would it be a possible route for you to go via NFAs?
Yes, this should work. I think we already have code for this (using our own NFA definition once again) :)
Stefan Hetzl (Oct 04 2024 at 09:17):
A second student has now contacted me with the wish to also do a bachelor's thesis in Lean. I have recommended that he formalises Parikh's theorem. The proof on wikpedia is based on a certain form of pumping lemma for context-free languages. I will have a look at other proofs in the literature too. But it would make sense to coordinate this also with your student @Martin Dvořák .
Martin Dvořák (Oct 04 2024 at 09:34):
@Alex Loitzl ping
Abdullah Ijaz (Oct 29 2024 at 18:13):
Hi, I'm also a student hoping to do my bachelor's thesis on Lean and computability. Please could you give me suggestions for what theorem I should aim to formalise? Perhaps something related to DFA minimisation or converting DFA to regex with Brzozowski's algebraic method instead of state elimination. I am new to Lean so I'm not sure what a reasonable goal would be.
Martin Dvořák (Dec 19 2024 at 15:21):
@Alex Loitzl 🐓 has just finished the proof of context-free pumping!
https://github.com/AlexLoitzl/pumping_cfg/blob/77bf295e09124eab0086912dd69f2ccc825da410/PumpingCfg/Pumping.lean#L304
Before it can be contributed to Mathlib, we need to review his #19943 that translates context-free grammars to the Chomsky normal form. The code above depends on this PR.
Mehmet Emre (Dec 23 2024 at 14:29):
I have an undergraduate student who is planning to work on formalizing parts of our undergrad-level Formal Languages and Automata course (standard theorems and some exercises). I haven't seen an issue tracking this on mathlib4 repo. Just to avoid duplicate effort, what are some areas that can use some help implementing that nobody is working on?
Some areas we're interested in that I haven't seen in mathlib4 docs are:
- any missing parts in existing efforts for Regular Expression<->DFA equivalence (thompson's construction and the standard elimination algorithm).
- closure of regular languages under set operations using the Cartesian product trick.
- closure of regular languages under string homomorphism (which is pretty close to
DFA.comap
). We can work on other closure proofs too. Eventually, I want to transcribe my homework solutions to Lean so this is useful to us for that reason. - defining PDAs and showing equivalence between different flavors of PDAs (for example, Sipser defines PDAs to push/pop one symbol at a time but Linz defines them to always pop one symbol and push a string of symbols).
- deriving some PDA closure properties (closure under intersection with a DFA).
- PDA<->CFG conversion.
We will be able to work on these starting in ~a month but I wanted to make sure that what we are working on synergizes well with other ongoing efforts. We will be able to do probably only 2--3 of these in the next few months.
Martin Dvořák (Dec 23 2024 at 15:19):
I think @Tobias Leichtfried is working on the PDA<->CFG conversion.
Also, before you start proving equivalences between the PDA and other pushdown-like automata, consult the PDA definition with him, please. You both should work the same "main" definition of a PDA.
Martin Dvořák (Dec 23 2024 at 15:23):
FYI, intersections are the only closure properties I did NOT do in terms of grammars, so they are certainly good targets for automata-based formalization!
Mehmet Emre (Dec 24 2024 at 08:48):
Sounds good. I just implemented set operation closure properties for DFAs to get a feel of mathlib, will create a PR soon. We can work on regular language-related formalizations until @Tobias Leichtfried pushes a PDA definition.
Stefan Hetzl (Dec 25 2024 at 08:53):
@Mehmet Emre You can check out Tobias' formalisation at the PDA branch of my autth repo. The definition of PDA is found in PDA.lean. The formalisation follows essentially the proof in the Hopcroft/Ullman book. The equivalence between CFG and PDA with empty stack is already done. The equivalence between PDA with empty stack and PDA with final state is partially done and still work in progress.
Last updated: May 02 2025 at 03:31 UTC