Zulip Chat Archive

Stream: computer science

Topic: Automata theory in Lean4

Ching-Tsun Chou (Jul 30 2025 at 19:34):

This is the discussion thread for #announce > Automata theory in Lean4 .

Kim Morrison (Jul 31 2025 at 00:42):

Do you think it would be possible to find an implementation of NFA and DFAs that is suitable both for your work and what currently exists in Mathlib? It would be great to unify these again, perhaps with a view to learning from these two experiences and putting the synthesis in the new CSLib?

Ching-Tsun Chou (Jul 31 2025 at 06:56):

For my purpose, the main inconvenience of the current definitions of NFA and DFA in mathlib is that the set of accepting states are bundled in as a part of the NFA or DFA structure. This makes sense if one only has to deal with finite executions. But for automata on infinite words, the meaning of the accepting states is very different and in fact there are several different notion of acceptance with differently structured accepting states (see the definitions BuchiAccept, MullerAccept, etc in https://github.com/ctchou/AutomataTheory/blob/main/AutomataTheory/Automata/Basic.lean). By moving the accepting states out of the structure of an Automaton, I was able to use the same automata constructions to prove the same or similar closure properties for both regular and ω-regular languages.

On the other hand, I found it convenient to include the type of the states as a field of the Automaton structure, while Mathlib's NFA and DFA have that type as an input parameter. By bundling the state type, I can very easily define various constructions on automata without having to separately define the corresponding constructions on the state types (for example, see practically every file in https://github.com/ctchou/AutomataTheory/blob/main/AutomataTheory/Automata/). I do have to make separate definitions about accepting states, but that is necessitated by the math because the same construction tends not to work for both finite and infinite words.

Another minor difference is that I don't have ε-transitions in my theory. In textbooks the ε-transitions are typically used to smooth the presentation of the proofs of various closure properties of regular languages. But one can do the same proofs without them, at the expense of a slightly less transparent proof in the case of concatenation.

Fabrizio Montesi (Jul 31 2025 at 07:52):

I don't think these differences are insurmountable, but some work needs to be done, I agree.

Fabrizio Montesi (Jul 31 2025 at 07:53):

Also, your notions of acceptance can probably be easily defined on top of the Lts type we already have. There we have already come up with some strategies. (And vice versa we might benefit from some of your definitions.)

Shreyas Srinivas (Jul 31 2025 at 08:39):

Ching-Tsun Chou said:

Another minor difference is that I don't have ε-transitions in my theory. In textbooks the ε-transitions are typically used to smooth the presentation of the proofs of various closure properties of regular languages. But one can do the same proofs without them, at the expense of a slightly less transparent proof in the case of concatenation.

Epsilon NFAs are actually superfluous and can be accomplished by raising the alphabet type to an Option type

Shreyas Srinivas (Jul 31 2025 at 08:40):

Have you used Mathlib's Stream API? Last I checked, it already had some API that would make infinite words convenient to handle

Shreyas Srinivas (Jul 31 2025 at 08:40):

Infinite trees for tree automaton will be much harder

Shreyas Srinivas (Jul 31 2025 at 08:52):

Also @Ching-Tsun Chou is the repository open for pull requests?

Fabrizio Montesi (Jul 31 2025 at 12:42):

Shreyas Srinivas said:

Have you used Mathlib's Stream API? Last I checked, it already had some API that would make infinite words convenient to handle

+1 for the Stream API, I'm already using it for the definition of divergence in LTS. (Well, Stream', to be precise.)

Fabrizio Montesi (Jul 31 2025 at 13:04):

Btw, here's how a definition based on Lts would look like, since a DFA is a transition system.

import Cslib.Semantics.Lts.Basic

structure Dfa (State : Type _) (Label : Type _) where
  -- The transition system of the automaton
  lts : Lts State Label
  -- Start state
  start : State
  -- Accept states
  accept : Finset State
  -- The automaton is finite-state
  finite_state : lts.FiniteState
  -- The automaton is deterministic
  deterministic : lts.Deterministic

Then one can reuse the theorems for Lts. Alternatively, one can make custom definitions for the transition relation, finite_state, and deterministic, make a toLts function and then prove that the custom properties are the same as defined in the Lts library. Then one could go back and forth between the results for Lts and automata, I think.

(This is just a sketch, one can move a few other things around.)

Fabrizio Montesi (Jul 31 2025 at 13:06):

@Ching-Tsun Chou The rationale in Lts is to keep things as unbundled in the general lib, then one can bundle things as they please for their proofs more down the line. I think some of your defs (like for executions and accepting runs) might apply (almost?) directly to all transition systems.

Fabrizio Montesi (Jul 31 2025 at 13:08):

I've made a playground here https://github.com/cs-lean/cslib/tree/automata in case anybody wants to play with this.

Shreyas Srinivas (Jul 31 2025 at 13:41):

Another tidbit. For non determinism I would use the RTC construction to construct the transition relation of a run

Alex Meiburg (Jul 31 2025 at 14:26):

Even if the definitions are fundamentally incompatible, there could be some Equivs and theorems proving some correspondences (perhaps in restricted cases).

Shreyas Srinivas (Jul 31 2025 at 14:30):

In this case I think Ching-Tsun's API is more general than the current Mathlib API.

Ching-Tsun Chou (Jul 31 2025 at 19:37):

Thanks for the comments! I have introduced some infix and postfix notations for operations on languages and ω-languages yesterday (see https://github.com/ctchou/AutomataTheory/blob/main/AutomataTheory/Languages/Basic.lean). I plan to find some better notations for operations on sequences as well. BTW, if anyone has a better idea for the issue I raised in #new members > Unicode notation question , please let me know.

About the Stream API: I'm not sure it will help, because then you need to talk about whether an infinite stream actually contains infinitely many non-none letters to make up an infinite word. My experience suggests that that would be a messy business. In any case, I think the ε-transitions are just a technical device for making certain proofs look better on paper. We don't really need them once those proofs have been done without them.

In my formalization the deterministic automaton is not a special case of a nondeterministic automaton (Automaton), but a separate class (DetAutomaton) with a unique initial state and a next-state function and a .toNA function to convert it to a nondeterministic automaton (https://github.com/ctchou/AutomataTheory/blob/main/AutomataTheory/Automata/Det.lean). I actually started with DA being a special case of NA, but found later that having two separate classes is more convenient. For one thing, I can easily define a function (DetAutomaton.DetRun) that maps an input word to a run and can easily prove properties about it. The power-set construction (https://github.com/ctchou/AutomataTheory/blob/main/AutomataTheory/Automata/PSet.lean) and the congruence construction (https://github.com/ctchou/AutomataTheory/blob/main/AutomataTheory/Automata/Congr.lean) very naturally produce DetAutomaton.

Also, the finite-state assumption is almost never needed in proving properties about various automata constructions. Basically I start to introduce the finite-state assumption only as part of the definitions of regular and ω-regular languages, which do require the automata accepting them to be finite-state. You can search for the string "Finite " to find out where the finite-state assumption enters into the theory. (BTW, the automata theory in mathlib also doesn't assume NFA and DFA are finite-state, despite the "F" in their names.). Incidentally, I never have to assume that the alphabet type A is finite anywhere. (Of course, this generality is illusory, because a finite-state machine can only make finitely many distinctions on an infinite alphabet.)

Finally, I'd like to note that the stuff in:
https://github.com/ctchou/AutomataTheory/blob/main/AutomataTheory/Sequences/Segments.lean
may be of interest independently of automata theory.

Ching-Tsun Chou (Aug 01 2025 at 06:52):

Sorry, I misunderstood what @Shreyas Srinivas wrote about Stream' above. Please ignore what I've crossed out above. I will look into Stream' to see how to use it.

Ching-Tsun Chou (Aug 01 2025 at 07:10):

I think I confused Stream' with Stream, which does contain optional values. They really should have chosen a different name.

Ching-Tsun Chou (Aug 01 2025 at 08:07):

I looked into Stream'. I don't think I'll use it. The API of Stream' seems designed to support working with infinite sequences without ever using explicit indices to access the sequence's elements. This is exactly the opposite of what most proofs in my theory do, where direct indexing (and hence the omega tactic) are used everywhere. There are two functions in Stream' which I ended up duplicating in my theory (AppendListInf and SuffixFrom, which correspond to appendStream' and drop in Stream'). But in order to use the Stream' versions, I would have to change all the type declarations from ℕ → * to Stream' *, which is more trouble than it is worth.

Last updated: Feb 28 2026 at 14:05 UTC