Zulip Chat Archive
Stream: batteries
Topic: regular expression derivatives
Paige Thomas (Sep 04 2024 at 04:51):
I was just wondering if anyone else has/is worked/working on the formalization of a lexer in Lean using Brzozowski regular expression derivatives?
Shreyas Srinivas (Sep 04 2024 at 12:43):
I know someone who has done regex derivatives in coq
Shreyas Srinivas (Sep 04 2024 at 12:44):
For omega regular expressions
Shreyas Srinivas (Sep 04 2024 at 12:44):
It was an SRC poster at icfp
Shreyas Srinivas (Sep 04 2024 at 12:44):
It wasn't used for lexing but proving language equivalence
Paige Thomas (Sep 05 2024 at 01:34):
Actually, that is part of what I am stuck on, the check for language equivalence in the algorithm for generating a DFA from a regular expression.
Paige Thomas (Sep 05 2024 at 01:36):
The paper I'm going by defines a similarity relation for comparing regular expressions, but I'm not sure how to go about turning it into a computable function.
https://www.cambridge.org/core/journals/journal-of-functional-programming/article/regularexpression-derivatives-reexamined/E5734B86DEB96C61C69E5CF3C4FB0AFA
Paige Thomas (Sep 05 2024 at 01:39):
In code, the relation is
inductive Similar {α : Type} : RegExp α → RegExp α → Prop
| union_1
(R : RegExp α) :
Similar (RegExp.union R R) R
| union_2
(R S : RegExp α) :
Similar (RegExp.union R S) (RegExp.union S R)
| union_3
(R S T : RegExp α) :
Similar (RegExp.union (RegExp.union R S) T) (RegExp.union R (RegExp.union S T))
| union_4
(R : RegExp α) :
Similar (RegExp.union RegExp.zero R) R
| concat_1
(R S T : RegExp α) :
Similar (RegExp.concat (RegExp.concat R S) T) (RegExp.concat R (RegExp.concat S T))
| concat_2
(R : RegExp α) :
Similar (RegExp.concat RegExp.zero R) RegExp.zero
| concat_3
(R : RegExp α) :
Similar (RegExp.concat R RegExp.zero) RegExp.zero
| concat_4
(R : RegExp α) :
Similar (RegExp.concat RegExp.epsilon R) R
| concat_5
(R : RegExp α) :
Similar (RegExp.concat R RegExp.epsilon) R
| kleene_closure_1
(R : RegExp α) :
Similar (RegExp.kleene_closure (RegExp.kleene_closure R)) (RegExp.kleene_closure R)
| kleene_closure_2 :
Similar (RegExp.kleene_closure RegExp.epsilon) RegExp.epsilon
| kleene_closure_3 :
Similar (RegExp.kleene_closure RegExp.zero) RegExp.epsilon
example
{α : Type}
(RE_1 RE_2 : RegExp α)
(h1 : Similar RE_1 RE_2) :
RE_1.LanguageOf = RE_2.LanguageOf :=
by
induction h1
case union_1 R =>
simp only [RegExp.LanguageOf]
simp
case union_2 R S =>
simp only [RegExp.LanguageOf]
exact Set.union_comm R.LanguageOf S.LanguageOf
case union_3 R S T =>
simp only [RegExp.LanguageOf]
exact Set.union_assoc R.LanguageOf S.LanguageOf T.LanguageOf
case union_4 R =>
simp only [RegExp.LanguageOf]
exact Set.empty_union R.LanguageOf
case concat_1 R S T =>
simp only [RegExp.LanguageOf]
symm
exact Language.concat_assoc R.LanguageOf S.LanguageOf T.LanguageOf
case concat_2 R =>
simp only [RegExp.LanguageOf]
exact Language.concat_empty_left R.LanguageOf
case concat_3 R =>
simp only [RegExp.LanguageOf]
exact Language.concat_empty_right R.LanguageOf
case concat_4 R =>
simp only [RegExp.LanguageOf]
exact Language.concat_eps_left R.LanguageOf
case concat_5 R =>
simp only [RegExp.LanguageOf]
exact Language.concat_eps_right R.LanguageOf
case kleene_closure_1 R =>
simp only [RegExp.LanguageOf]
symm
exact Language.kleene_closure_idempotent R.LanguageOf
case kleene_closure_2 =>
simp only [RegExp.LanguageOf]
exact Language.kleene_closure_eps
case kleene_closure_3 =>
simp only [RegExp.LanguageOf]
exact Language.kleene_closure_empty
Shreyas Srinivas (Sep 05 2024 at 01:52):
I had dinner with Clement Pit-Claudel who pointed out that the reason people don't use Brzozowski derivatives for any computation is that the derivative size blows up (in automata terms).
Shreyas Srinivas (Sep 05 2024 at 01:55):
Maybe this thread helps wrt coinduction: https://leanprover.zulipchat.com/#narrow/stream/270676-lean4/topic/how.20to.20hack.20around.20the.20lack.20of.20coinduction/near/465847930
Paige Thomas (Sep 05 2024 at 02:00):
I guess I'm not sure I understand? They seem to claim good running times in the article.
Shreyas Srinivas (Sep 05 2024 at 02:04):
Maybe I misunderstood Clement
Alok Singh (Sep 23 2024 at 19:30):
here's a youtube video on it with conal elliot: https://www.youtube.com/watch?v=OoKNpfUNpfU
Paige Thomas (Sep 24 2024 at 05:29):
(deleted)
Paige Thomas (Sep 24 2024 at 05:29):
(deleted)
François G. Dorais (Sep 24 2024 at 07:56):
Kayla Thomas said:
The paper I'm going by defines a similarity relation for comparing regular expressions, but I'm not sure how to go about turning it into a computable function.
I haven't looked at the paper carefully but I just put it on my reading list. Do you think there's a way to create a normal form to check for similarity? That could be efficient depending on how costly normalization is.
Paige Thomas (Sep 25 2024 at 02:28):
@François G. Dorais
I found this Coq code and translated it to Lean here. I think I need to prove the theorem at the end of that Lean file (the current attempt there is not working, there is a paper proof in appendix II (theorem 4.3a) of this paper) and somehow use it to prove termination for some version of this Lean function which is a translation of the algorithm on page 179 of this paper. There is also something called partial derivatives of regular expressions (paper here) that I don't really understand, which may be more efficient?
As I understand it, if two regular expressions are "similar", then their languages are the same, but their can also be regular expressions that are not similar whose languages are the same. So the DFA that is constructed using it is not guaranteed to be minimal.
Paige Thomas (Sep 25 2024 at 02:50):
So in short, I think yes, in the first link to the Coq code.
Shreyas Srinivas (Sep 25 2024 at 11:11):
One of the ICFP SRC winners (from Saarland) was a bachelor's thesis on an algorithm for showing the equivalence of omega regular expressions. It uses regular expression derivatives (although the derivative of is conductive, that might be unnecessary for normal regular expressions)
Shreyas Srinivas (Sep 25 2024 at 11:11):
I mean ICFP 2024
Last updated: May 02 2025 at 03:31 UTC