Zulip Chat Archive

Stream: CSLib

Topic: Language.IsRegular in mathlib

Ching-Tsun Chou (Oct 28 2025 at 18:50):

I just noticed that the notion of regular languages in mathlib, Language.IsRegular, requires that the state type of the automaton to be Type 0, not Type _:

/-- A regular language is a language that is defined by a DFA with finite states. -/
def Language.IsRegular {T : Type u} (L : Language T) : Prop :=
  ∃ σ : Type, ∃ _ : Fintype σ, ∃ M : DFA T σ, M.accepts = L

This means that if we are not going to redefine the notion of regular languages in cslib, we will probably have to impose the same requirement on ω-regular languages as well, because the theory of regular languages a necessary part of the theory of ω-regular languages. Personally I think this is fine, because requiring a finite type to be in Type 0 does not seem an excessive requirement. But I just want to point it out.

Markus Himmel (Oct 29 2025 at 05:40):

docs#Language.isRegular_iff gives the equivalence to the universe-polymorphic version.

Ching-Tsun Chou (Oct 29 2025 at 06:32):

Thanks! I didn't notice that. Two questions:

Why was isRegular_iff not taken as the definition in the first place?
What is the key idea behind the proof of this theorem? Is it the existence of an equivalence (≃) between a finite type in any universe and a finite type in Type 0?

Ching-Tsun Chou (Oct 29 2025 at 06:52):

Alas, I'm getting an inkling of why (1) was not done. I have a proof of:

theorem regular_compl {l : Language Symbol} (h : l.IsRegular) : (lᶜ).IsRegular

If I use the universe-polymorphic version of IsRegular, the proof breaks and I get the error message below which I don't understand at all:

declaration Language.regular_compl contains universe level metavariables at the expression
∃ (State : Type ?u.151), ∃ (dfa : Cslib.DFA.{?u.151, u_1} State Symbol), DFA.language.{?u.151, u_1} dfa = lᶜ
in the declaration body
fun {Symbol} [Finite Symbol] {l} h =>
Eq.mpr (id (congrArg (fun _a => _a) (propext regular_iff_cslib_dfa.{u_1, ?u.151})))
(Exists.casesOn (Eq.mp (congrArg (fun _a => _a) (propext regular_iff_cslib_dfa.{u_1, ?u.151})) h)
fun (State : Type ?u.151)
(h : ∃ (dfa : Cslib.DFA.{?u.151, u_1} State Symbol), DFA.language.{?u.151, u_1} dfa = l) =>
Exists.casesOn h fun (dfa : Cslib.DFA.{?u.151, u_1} State Symbol) (h : DFA.language.{?u.151, u_1} dfa = l) =>
h ▸ Exists.intro State (Exists.intro (DFA.compl.{?u.151, u_1} dfa) (DFA.compl_lang.{?u.151, u_1} dfa)))

I remember that in my personal automata theory project:
https://github.com/ctchou/AutomataTheory
I also got into a lot of trouble with making the type of automata state universe-polymorphic and ended up using Type 0. I had thought that the trouble was caused by the bundled design of the automata type (where the state type is one of the fields of the automata structure). But now the automata type in cslib is unbundled and we still get into trouble with universe polymorphism.

Markus Himmel (Oct 29 2025 at 06:53):

Ching-Tsun Chou said:

Thanks! I didn't notice that. Two questions:

Why was isRegular_iff not taken as the definition in the first place?

What is the key idea behind the proof of this theorem? Is it the existence of an equivalence (≃) between a finite type in any universe and a finite type in Type 0?

This is so that for every L there is exactly one proposition Language.IsRegular L, not one for each universe level.
Yes.

Markus Himmel (Oct 29 2025 at 06:55):

Yes, the error message basically means that if you just write l.IsRegular, then Lean has no way of knowing which universe σ is supposed to come from. What you would want from a universe-polymorphic version is ∃ u : Level, ∃ σ : Type u, ∃ _ : Fintype σ, ∃ M : DFA T σ, M.accepts = L, but this is not something you can write down in Lean. Instead, for every u you get a IsRegular.{u} L, which is not what you want.

Ching-Tsun Chou (Oct 29 2025 at 07:06):

I'm not sure I really understand this. But my sense is that it is not worthwhile to struggle with universe variables. I'll stick to using Type for state types.

Fabrizio Montesi (Oct 29 2025 at 19:20):

Maybe it's time I raise the issue of whether/when/how we should port all of automata and language theory from mathlib to cslib, since Language.IsRegular uses mathlib's DFA, not cslib's DFA. So it could be a hassle to use.

Fabrizio Montesi (Oct 29 2025 at 19:21):

Though @Ching-Tsun Chou has just pointed out to me that he's bridging this quite nicely for now, so I should note this here as well. :-)

https://github.com/ctchou/cslib/blob/regular-language-work/Cslib/Computability/Languages/RegularLanguage.lean

Ching-Tsun Chou (Oct 29 2025 at 19:25):

I'll make the above file and associated changes into a PR pretty soon, after I prove the closure of regular languages under boolean operations (which I had thought were in mathlib but don't actually seem to be). Closure under concatenation and Kleene star will take (quite) more work.

Chris Henson (Oct 29 2025 at 19:27):

This is good to consider, we certainly would like to maintain as much compatibility as possible. Relatedly I've been meaning to ask if the recent activity at #mathlib4 > Regular languages: the review queue is relevant to what's being worked on in CSlib. (I have not read all the PRs there, so I don't know myself)

Ching-Tsun Chou (Oct 29 2025 at 19:30):

I didn't know about that at all!

Ching-Tsun Chou (Oct 29 2025 at 19:50):

My general sense is that I'll prove what I need along the way. If new additions are made to mathlib, I'll use them.

Chris Henson (Oct 29 2025 at 19:53):

Ah, well glad I mentioned it. I would appreciate your keeping an eye on what is happening there since you are probably most familiar with how it will affect us. Proving what you need and periodically checking for relevant Mathlib additions sounds fine to me.

Ching-Tsun Chou (Oct 30 2025 at 22:25):

Someone should make a website tracking the correspondence between mathlib PR activities and existing mathlib theories. It would be very nice if a webpage like the following:
https://leanprover-community.github.io/mathlib4_docs/Mathlib/Computability/DFA.html
also contains pointers to the PRs discussed in the thread that Chris forwarded. Personally I don't have the bandwidth to browse Zulip sufficiently to keep a tap of what's going on.

Eric Wieser (Oct 31 2025 at 07:17):

What's the benefit of Cslib.DFA over docs#DFA ? The latter looks strictly more general to me, and I think the former was explicitly transformed into the latter when PRd to Mathlib

Ching-Tsun Chou (Oct 31 2025 at 07:59):

I think there are still some uncertainties about many of the definitions; see the discussion in this thread: #CSLib > Question Lean structure and extends

Chris Henson (Oct 31 2025 at 08:12):

If there is not a good reason for it to be bundled as I advocate in that thread, I don't see why we couldn't use the Mathlib definition. (I understand Cslib.NFA is currently different because it extends LTS)

Fabrizio Montesi (Oct 31 2025 at 11:02):

Eric Wieser said:

What's the benefit of Cslib.DFA over docs#DFA ? The latter looks strictly more general to me, and I think the former was explicitly transformed into the latter when PRd to Mathlib

Mathlib's DFA = Cslib's DA + the theorems in Cslib's DFA.

(As I agree with Chris that there's no good reason to bundle things, as things currently stand.)

Alex Meiburg (Oct 31 2025 at 13:29):

Tangentially - on the topic of the actual mathematical meaning of universes here: In the "standard" presentation of deterministic automata, we require a finite state space (Fintype σ); relaxing this to allow for infinite state spaces (as a larger class of automata) isn't interesting, because then we can add a state for every possible string over T, and so there is an automaton for every language.

But, that really only works when T and σ are the same universe. The proposition

def Language.IsRegular' {T : Type u} (L : Language T) : Prop :=
  ∃ σ : Type 0, ∃ M : DFA T σ, M.accepts = L

is equivalent to True when u is universe 0, but it becomes something nontrivial when u is a higher universe. Of course this is a version of the more general idea that you get something nontrivial as long as the alphabet is an infinite set with larger cardinality than your state space; e.g. you can talk about an automaton that accepts languages written over the alphabet of real numbers, but with only a countable state space.

Shreyas Srinivas (Oct 31 2025 at 14:07):

There is an earlier thread arguing for an LTS definition and making NFA and DFA depend on it because LTSes are far more general structures

Last updated: Dec 20 2025 at 21:32 UTC