Zulip Chat Archive

Stream: CSLib

Topic: Formalization of regular expression


Gordon Hsu (Feb 20 2026 at 07:11):

Hi, I am new to CSLib. I formalized "regular expression matching a string" a while ago. I wonder if I can contribute it in some form to CSLib, or whether it is already done here?

https://github.com/kuotsanhsu/lean4-dump/blob/81357e9125c785748c1f553ca630711258e58b14/RegExp/Derivative.lean#L133

Also, I wonder if there is interest in porting "Software Foundations" by Benjamin C. Pierce et al. to Lean?

Gordon Hsu (Feb 22 2026 at 14:16):

Is anyone reading this? Does anyone have the time to a guide a newbie like me?

Chris Henson (Feb 22 2026 at 15:11):

I had meant to reply to this earlier. I believe there is already some ongoing work with more generality, as discussed at #CSLib > ideas for first contribution @ 💬

Chris Henson (Feb 22 2026 at 15:13):

Regarding Software Foundations: in my view it is a mix of technical and pedagogical work to make something idiomatic that adds something worthwhile in contrast to the "native" guides to learning Lean. There has been a recent effort to do this automatically with AI (see #Machine Learning for Theorem Proving > Autotranslation of Logical Foundations from Rocq to Lean) but I do not know of a concentrated human effort.

Gordon Hsu (Feb 22 2026 at 15:31):

I see, I look forward to the automatic translation of Software Foundations. In the meantime, do you think it's worth it to incorporate some hand rolled code? I have been using Software Foundations as a way to practice Lean. If it doesn't fit, I am also good with it.

https://github.com/kuotsanhsu/lean4-tutorials/tree/software-foundations/SoftwareFoundations/LogicalFoundations

Gordon Hsu (Feb 22 2026 at 15:34):

My RegExp link points to a formalized List Char matcher. Am I mistaken to not have seen the same thing in CSLib or Mathlib?

Chris Henson (Feb 22 2026 at 15:38):

Gordon Hsu said:

I see, I look forward to the automatic translation of Software Foundations. In the meantime, do you think it's worth it to incorporate some hand rolled code? I have been using Software Foundations as a way to practice Lean. If it doesn't fit, I am also good with it.

https://github.com/kuotsanhsu/lean4-tutorials/tree/software-foundations/SoftwareFoundations/LogicalFoundations

I don't think this fits into CSLib, but it can be a very good personal exercise! I have myself spent time with Software Foundations and PLFA.

Chris Henson (Feb 22 2026 at 15:43):

Gordon Hsu said:

My RegExp link points to a formalized List Char matcher. Am I mistaken to not have seen the same thing in CSLib or Mathlib?

Just making sure, you've seen docs#RegularExpression in Mathlib? The discussion I linked is talking about generalizing this.

Gordon Hsu (Feb 22 2026 at 15:49):

So sorry, you are right. There is this RegularExpression.rmatch_iff_matches' in Mathlib that does the same thing. Thank you so much for your help. All of my questions in this thread have been answered.

https://leanprover-community.github.io/mathlib4_docs/Mathlib/Computability/RegularExpressions.html#RegularExpression.rmatch_iff_matches'

Ching-Tsun Chou (Feb 22 2026 at 22:19):

Currently there is really nothing about regular expressions in cslib. You are welcomed to contribute things which are not already formalized in mathlib or cslib, such as the equivalence between regular expressions and regular languages.

Kim Morrison (Feb 23 2026 at 02:10):

What really matters in this space is a practical, efficient, and standards conformant (ha! funny thing to say about regex...) Lean implementation of regular expressions, that is as good as that should be without regard for verification, and then a verification layer on top of that.

Kim Morrison (Feb 23 2026 at 02:10):

The theory side is of course fun, but scarcely as useful or interesting.

Snir Broshi (Feb 23 2026 at 02:53):

Kim Morrison said:

What really matters in this space is a practical, efficient, and standards conformant (ha! funny thing to say about regex...) Lean implementation of regular expressions, that is as good as that should be without regard for verification, and then a verification layer on top of that.

Already exists: #general > Regex library for Lean? https://github.com/pandaman64/lean-regex
The readme doesn't show any benchmarks but it's probably reasonably performant

Chris Henson (Feb 23 2026 at 02:56):

Kim Morrison said:

The theory side is of course fun, but scarcely as useful or interesting.

I think some would debate this, but the broader point stands that I've not yet seen discussion about formalizing some actual real world standard in CSLib. There's for instance recent prior work in Rocq formalizing JavaScript regular expressions.

Chris Henson (Feb 23 2026 at 02:58):

@Snir Broshi I think there's still interest in verification of a commonly used and more feature complete standard like I link above.

Gordon Hsu (Feb 23 2026 at 03:21):

What draws me to Lean is the ability to program according to formal spec and prove correctness/time-complexity without relying so much on testing. I believe Dijkstra's vision of constructing programs alongside proofs is getting very practical with Lean especially since the introduction of mvcgen. Practically speaking, theory should guide implementation, and implementation should inspire the development of theory. POSIX and IEEE standards could really use some formalization, especially IEEE 754.

Shout out to lean-regex! The design is so awesome with unicode support, backtracking, VM, and correctness proofs. As a programmer, lean-regex is what I want to see in this space.

Henrik Böving (Feb 23 2026 at 07:54):

Snir Broshi said:

Kim Morrison said:

What really matters in this space is a practical, efficient, and standards conformant (ha! funny thing to say about regex...) Lean implementation of regular expressions, that is as good as that should be without regard for verification, and then a verification layer on top of that.

Already exists: #general > Regex library for Lean? https://github.com/pandaman64/lean-regex
The readme doesn't show any benchmarks but it's probably reasonably performant

The performance is not horrible for the general case but real world regex engines specialize lots of patterns to achieve performance orders of magnitudes larger than one could with the generic techniques. If we want it to be usable for a general real world use case we should try to do the same.

Fabrizio Montesi (Feb 23 2026 at 07:56):

General note: Please create a separate thread if you want to discuss Software Foundations (or PLFA). I'll stick to the regexp part here.

Fabrizio Montesi (Feb 23 2026 at 07:58):

Re lean-regex: are there ways we can help it through CSLib? It is indeed the kind of software that we'd like to support.

Snir Broshi (Feb 23 2026 at 09:57):

cc @pandaman

pandaman (Feb 23 2026 at 10:30):

Hi, I’m the author of lean-regex. Happy to see it mentioned here!

It seems to me that there are two substantially different lines of work in this area.

One is the formalization of the theory of regular languages: equivalence with automata, Myhill–Nerode, closure properties, etc. It's about the formalization of theoretical compute science, and Mathlib.Computability.RegularExpressions and many other attempts I've seen falls into this category.

The other is the verification of a real-world regex engine. That involves features beyond textbook regular languages (capture groups, substring matching, Unicode, etc), as well as algorithm implementations and their speed. (It's still 10x slower than an equivalent implementation in Rust, though)

These two efforts are certainly related and complementary, but I think they should be treated as distinct kinds of work. They differ in scope, design constraints, and intended audience. lean-regex is primarily in the second category: it’s about building and verifying an executable engine, rather than developing the general theory of regular languages.

pandaman (Feb 23 2026 at 10:35):

Re lean-regex: are there ways we can help it through CSLib? It is indeed the kind of software that we'd like to support.

I’m a bit hesitant to make lean-regex part of CSLib or downstream of it.

One of my goals is for lean-regex to be usable as a practical library in Lean, without requiring a Mathlib dependency. It feels suboptimal if someone writing Lean as a programming language has to pull in Mathlib just to get regex support.

Currently, the verification proofs in lean-regex do depend on Mathlib, but I’ve separated those into a different Lake package in the repository. In principle, that proof layer could be contributed to CSLib, but doing so would introduce some coordination and maintenance overhead, so I’d want to think carefully about whether that’s the right long-term structure.

That said, I’d definitely be happy to collaborate and see how we can make the pieces fit together cleanly!

Chris Henson (Feb 23 2026 at 11:18):

Hi! I think we're all on the same page about there being two lines of work. We so far have focused on the former, but consider the latter in scope for CSLib. This is why I mentioned the above work in Rocq of this verification category for JavaScript regex.

I agree the that thinking about coordination and long-term structure is required, on both sides.


Last updated: Feb 28 2026 at 14:05 UTC