Zulip Chat Archive

Stream: mathlib4

Topic: Automata Theory Contribution

Ayden Lamparski (Dec 08 2025 at 23:45):

Hi all, I’m interested in contributing some automata theory to computability. I have been working on this project for several months and have been able to formalize quite a number of standard results of the theory that are not presently a part of mathlib. I wanted to check whether the direction I have in mind is a good fit for mathlib, and I am looking for maintainers to review the eventual pull requests.

Here is a link to the github repo.

In brief: At present, the computability folder has a definition of deterministic automaton in Computability/DFA. (The ‘F’ in ‘DFA’ is a bit of a misnomer, since as defined these automata are not necessarily finite.) I use this definition in my work. It appears that there is nothing in the present folder showing the existence of a unique minimal deterministic automaton for each language, nor of the algorithm for constructing the minimal automaton in the finite case.

These are standard constructions in automata theory and are critical in proving results in the algebraic theory of formal languages.

Below is a more detailed summary of what I have done so far.

DFA morphisms

I define a morphism between DFAs M and N as a map between their state types that

sends the start state of M to the start state of N,
sends accepting states of M to accepting states of N, and
is compatible with the transition functions.

From such a morphism M ⟶ N we prove that the language accepted by M is equal to language accepted by N.

Here is a self-contained file that defines these morphisms. Should I make a PR that adds these definitions to Mathlib.Computability.DFA or should I add it in a new file (Mathlib.Computability.DFAHom)?

Accessibility

A state of a DFA is accessible if it is reachable from the start state by some word. A DFA is accessible if all of its states are accessible. I define:

accessibility for states and a typeclass expressing that a DFA is accessible,
a partial order on DFAs by M ≤ N iff there exists a surjective DFA morphism N ⟶ M, and
a “DFA.toAccessible” function: given a DFA, restrict its state type to the accessible states to obtain an accessible DFA accepting the same language. This construction preserves Fintype and DecidableEq instances on the state type, so it is “computable” in the sense that it really produces a finite DFA when you start with one.

Here is a link to the file with these constructions.

DFA minimization

On top of this, I have implemented a minimization algorithm that

takes as input a DFA with Fintype / DecidableEq state type
outputs a DFA accepting the same language

The resulting DFA is minimal with respect to the surjective-morphism order above. As a consequence it has the smallest number of states among DFAs accepting the language, and is the unique (up to isomorphism) DFA with this property.

I further prove that this construction preserves Fintype and DecidableEq. In fact, the automaton constructed by the algorithm is equivalent (isomorphic) to the Nerode automaton of a language, which is already in mathlib (Computability/MyhillNerode). However, the existing definition in mathlib builds this automaton directly from an arbitrary language; since it doesn’t start from a DFA with a finite state type, the resulting automaton is not shown to carry a Fintype or Finite instance even when the language is regular. Nor does the existing definition show that the resulting automaton is minimal.

By contrast, this minimization procedure starts from a DFA with a finite state type and stays within that world. In order to make this algorithm computable, I had to define another typeclass on DFAs, which I call Fin, defined here. This typeclass provides the set of accepting states of a DFA as a Finset rather than a Set. Without this typeclass, it would be impossible to decide word membership in a DFA’s language, even if the state and alphabet types have Fintype and DecidableEq instances.

Longer-term goals

The main motivation is to set up infrastructure for later results such as:

the syntactic monoid of a regular language (transition monoid of a minimal DFA),
the characterization of star-free languages via aperiodic syntactic monoids (Schützenberger’s theorem),

connections to finite semigroup theory, which currently seems sparse in mathlib. I am currently participating in a parallel project aimed at formalizing the algebraic theory of finite semigroups in Lean, and these can be found in this github repo.

Shreyas Srinivas (Dec 09 2025 at 00:22):

Hey, I think this might fit in CSLib

Snir Broshi (Dec 09 2025 at 00:22):

Hello :wave:

You should consider #CSLib (https://github.com/leanprover/cslib @Fabrizio Montesi @Chris Henson).
The computability folder still accepts PRs sometimes, and there isn't a clear policy about its relation with CSLib, but the automata in CSLib are a consolidation of many efforts from multiple people into one coherent and well thought out design. The project is still early days, but minimizing DFAs is a great addition.

As for the project itself: I like the morphism & accessibility concepts, looks good! Also I think your Fin class can and should be replaced by Fintype M.accept, it looks equivalent (docs#Set.toFinset).

Snir Broshi (Dec 09 2025 at 00:24):

Also worth checking out: #mathlib4 > Regular languages: the review queue

Snir Broshi (Dec 09 2025 at 00:25):

and #CSLib > Automata

Chris Henson (Dec 09 2025 at 00:32):

Pinging @Ching-Tsun Chou because as the author he is most knowledgeable about CSLib's automata.

Ching-Tsun Chou (Dec 09 2025 at 06:14):

Currently the automata theory in CSLib has the definitions of deterministic and nondeterministic automata and the finite-word, Büchi, and Muller acceptance conditions, the closure of regular languages under boolean operations, and the closure of ω-regular languages under union and intersection. You can find most of the code under the following directories:

Cslib/Computability/Automata
Cslib/Computability/Languages
Cslib/Foundations/Data/OmegaSequence

in CSLib. My main goal is to port the results I formalized earlier in a personal project:
https://github.com/ctchou/AutomataTheory
into CSLib.

I am not familiar with the algebraic approach to automata theory, but it looks like a good addition to CSLib and will probably be pretty orthogonal to what I have already done and plan to do in automata theory. @Ayden Lamparski, could you give us some pointers to the documents and/or textbooks you are following?

On the mathlib front, the PR for the closure of regular languages under boolean operations (mathlib#31247) has just been merged and the PR for the closure under concatenation seems about to be merged (mathlib#30872).

Fabrizio Montesi (Dec 09 2025 at 14:21):

DFA minimization (and connected topics) is definitely something we'd like to have in #CSLib!

Tanner Duve (Dec 09 2025 at 22:55):

Cool formalization! Here are some notes for a course I TA'd last year that covers the algebraic approach to automata theory in chapter 5. You can define (co)products and pullbacks/pushouts of DFAs using these morphisms as well

Last updated: Feb 28 2026 at 14:05 UTC