Zulip Chat Archive

Stream: lean4

Topic: Making lean4-parser fold combinators total for verified pars

Nicolas Rouquette (Feb 21 2026 at 18:43):

I'm working on a verified parser for YAML 1.2.2 built on top of lean4-parser and ran into a fundamental barrier: the core fold combinators (efoldlPAux, foldr, takeUntil, dropUntil, count, countUntil) are all partial def, which makes them opaque to Lean's kernel and blocks proving any properties about parsers that use repetition.

The root cause is that ParserT is polymorphic in the monad m, so inside p s >>= fun | .ok s' x => ... we can't extract the new stream state s' for a termination_by clause — we're inside a monadic bind over an arbitrary m.

With the help of Github Copilot + Claude Code Opus 4.6, I wrote up an issue exploring four strategies and proposing one:

Issue: https://github.com/fgdorais/lean4-parser/issues/95
PR: https://github.com/fgdorais/lean4-parser/pull/96

The short version: add a remaining : σ → Nat field to Parser.Stream that provides an upper bound on tokens left, then use it as fuel for structural recursion on Nat. The fold combinators become total while preserving their semantics — existing call sites are unchanged since fuel defaults to Stream.remaining s.

protected class Parser.Stream (σ : Type _) (τ : outParam (Type _)) extends Std.Stream σ τ where
  Position : Type _
  getPosition : σ → Position
  setPosition : σ → Position → σ
  remaining : σ → Nat  -- new: upper bound, must decrease when next? returns some

The key insight is that this is the only approach (of the four analyzed) that works inside the monadic bind for arbitrary m — specializing to Id, requiring Consuming proofs per-parser, or redesigning the fold API all have significant tradeoffs detailed in the issue.

The PR includes a reference implementation with all 6 partial defs replaced and all existing stream instances updated. It builds cleanly, and I've confirmed my downstream verified YAML parser project builds against it with only a one-line addition to its Parser.Stream instance.

I'd welcome constructive feedback on:

The remaining contract and whether the field belongs in Parser.Stream vs. a separate typeclass
The fuel-based termination approach vs. alternatives I may not have considered
Any concerns about backward compatibility (every Parser.Stream instance must now implement remaining)

Mirek Olšák (Feb 21 2026 at 21:00):

There is an interesting approach at #lean4 > Extrinsic termination proofs for well-founded recursion @ 💬 , could it work in your case? The trick is to define the function together with proven guarantees about its value in case it terminates. Then, it is possible to have both original partial function, as well as the backing by logic on terminating values, without even needing extra level of trust.

François G. Dorais (Feb 21 2026 at 23:03):

Yes, lean4-parser needs to be adapted to Std.Iterators. That's on my (long) todo list. Contributions are welcome but please consult me about details.

Nicolas Rouquette (Feb 22 2026 at 02:02):

Thanks for the constructive comments!

@Mirek Olšák's extrinsic termination proofs for well-founded recursion with library support like @Paul Reichert 's s Myopic,is a fantastic idea; however, it would require a complex refactoring of the lean4-parser's fold API.

@François G. Dorais's suggestion to use the Batteries' Std.Iterators effectively standardizes and generalizes what the remaining field does: An iterator that implements Finite carries a proof that it will eventually exhaust, which yields the termination argument fold combinators need.

I took a first pass at this in PR #97. The key ideas:

A LawfulParserStream typeclass (opt-in, Prop-valued) that requires proving remaining strictly decreases on each next? yielding some
A StreamIterator bridge that gives every Parser.Stream an Iterator instance, and every LawfulParserStream a Finite + IteratorLoop instance (enabling for loops)
All fuel-based fold combinators (efoldlPAux, foldr, takeUntil, dropUntil, countUntil) replaced with well-founded recursion via termination_by Stream.remaining
mkDefault remains partial — no LawfulParserStream instance, so existing users are unaffected

The LawfulParserStream proofs are done for Subarray and OfList; the String.Slice, Substring.Raw, and ByteSlice instances still have sorry (they need simp lemmas for byte-index arithmetic that I haven't found in stdlib/Batteries yet).

Is this along the lines of what you were thinking, @François G. Dorais?

Mirek Olšák (Feb 22 2026 at 15:53):

Nicolas Rouquette said:

Myopic,is a fantastic idea; however, it would require a complex refactoring of the lean4-parser's fold API

I don't understand. The very idea of an extrinsic termination proof is that you don't change the API at all, but provide option to prove facts about it ex-post (which is not possible with current partial)

Nicolas Rouquette (Feb 22 2026 at 18:33):

You're right, I misspoke — extrinsic termination proofs by definition don't change the API, that's the whole point. Sorry for the confusion.

What I should have said: the obstacle today is that Lean 4's partial compiles to opaque, which blocks reasoning about the function body entirely. So to prove termination of lean4-parser's fold combinators right now, we can't do it extrinsically against the existing partial defs — we'd need something like Myopic to make the implementation available for reasoning, and I wasn't sure about its maturity/applicability to monadic well-founded recursion like efoldlPAux (which recurses through >>= in ParserT).

The PR takes the intrinsic route instead: replace partial def with total def + termination_by Stream.remaining, which makes the functions directly usable in proofs without any extra tooling. But I agree that if/when extrinsic proofs for partial become well-supported in Lean, that would be a cleaner separation — keep the simple partial implementations and prove termination on the side.

Mirek Olšák (Feb 23 2026 at 09:05):

To me the worry is if inserting termination guarantees inside do not create extra annoyance to users of the parser who don't care about logic (because they just need some parsing on meta-programming level). But I have to admit I don't know the parser too much.

Last updated: Feb 28 2026 at 14:05 UTC