Zulip Chat Archive
Stream: general
Topic: Regex library for Lean?
david cao (Aug 17 2025 at 07:37):
Hi friends! I've been looking to try using Lean as a general functional programming language, and I was wondering if there was a regex library in Lean (either a native implementation or some FFI situation) available? I'm building a tool that requires me to perform regex searches in text so I was wondering how I'd go about doing that :^)
Ruben Van de Velde (Aug 17 2025 at 07:41):
@pandaman had one, I think
pandaman (Aug 17 2025 at 07:47):
Yes! This library provides a Lean native implementation for regex: https://github.com/pandaman64/lean-regex/tree/main
Please let me know if you have any questions on the usage!
david cao (Aug 17 2025 at 07:47):
oh beautiful, tysm!
david cao (Aug 17 2025 at 07:55):
as a unrelated-ish question: are there commands in lean for browsing the documentation of definitions in lieu of html (a la the ? command in a julia repl)
Kenny Lau (Aug 17 2025 at 07:56):
if you do #check Nat.Prime and just hover over Nat.Prime then you'll see the documentation
Jon Eugster (Aug 17 2025 at 11:12):
@pandaman I had a short look at your regex package the other day trying to use it to find code blocks inside a markdown text. I got the impression that the implementation isnt frarure complete yet. Is this correct or did I just give up too quickly in my 30min fiddling around?
i.e. I got the impression the ?-operators dont exist, and also referencing previous matching groups. I tried something like
((`+)(.*?)\2)
to match arbitrary codeblocks with the same amount of starting and closing ticks. (Im typing this now on the phone, so I might have tried various variations of this syntax).
Any help to get the right regex would be appreciated:blush:
Alfredo Moreira-Rosa (Aug 17 2025 at 11:41):
Jon Eugster said:
pandaman I had a short look at your regex package the other day trying to use it to find code blocks inside a markdown text. I got the impression that the implementation isnt frarure complete yet. Is this correct or did I just give up too quickly in my 30min fiddling around?
Any help to get the right regex would be appreciated:blush:
There is another one (not so formaly proved) here that has more features : https://github.com/bergmannjg/regex
pandaman (Aug 17 2025 at 11:46):
@Jon Eugster Hi, thank you for trying lean-regex and sorry to hear you had the impression. Your impression is basically right: the library focuses on linear-time algorithms and intentionally omits features that require backtracking. In particular, it omits backreferences like \2 since they are known to be non-regular and require exponential time in the worst case. This design is similar to Go and Rust's regex implementations.
The problem of parsing matching parenthesis (or code fences in your case) is known to be not representable in a regular language, so I'd recommend using parser combinators/generators that has more power and support your use cases.
As for ? to make the repetition non-greedy, I forgot to implement the feature. I'll make an issue and fix it. Thank you!
François G. Dorais (Aug 17 2025 at 13:09):
There is also some regex support in lean4-parser. It may have some features that lean-regex doesn't but it was never designed for efficiency nor verification. I do have students working on an efficient and verified regex-related project but it is still very much WIP.
Jon Eugster (Aug 18 2025 at 12:31):
Thanks for the quick response @pandaman :)
I was just to lazy to implement my own parser, but I'll probably do that
Last updated: Dec 20 2025 at 21:32 UTC