Zulip Chat Archive

Stream: Program verification

Topic: Monadic program logic: Request for feedback


Sebastian Graf (Feb 14 2025 at 11:31):

Fellow Lean users,

As announced in the Lean FRO meeting at Lean Together 2025, I'm working on behalf of the Lean FRO on a Floyd/Hoare-style program logic framework to verify "arbitrary" monadic code written in Lean.
This thread is meant to collect requirements from you!

Scope: "Is this work relevant to my use case?"

TLDR; this work could be relevant to you if you want to verify code written using Lean's do-notation.

Let me clarify what "program logic for monadic code" means by giving a sketch of the definition of Hoare Triple I have in mind:

def M α := ...
instance : Monad M where ...
def triple (x : M α) (pre : $states  Prop) (post : α  $states  Prop) : Prop := ...

(Find here the exact definition I'm currently working with, but the above is close enough to give you an idea.)
The intention is that any user may provide a definition of M and triple (plus structural rules) and then verify programs in M building on an extensive program logic with good automation and UX.

In addition to scalable extrinsic proofs, the vision is to allow convenient intrinsic F*/Dafny/Verus-style proofs embedded into do blocks by elaboration into the extrinsic framework.

Note that although M α is "semantics of a program in Lean", you should be able to use this framework for any language with a denotational semantics into a monad.

This should include any strongly normalizing object language such as System F, but exclude untyped lambda calculus, System F with recursive types or mutable references (finding monadic semantic domains for the latter is an active research problem).

Request for feedback

I am interested in gathering any kind of feedback (please keep focus on semantics rather than syntax that is likely to change :smile:), or people I should reach out to that are not yet in the following list:

Jeremy Avigad, Mario Carneiro, Wojciech Nawrocki, John Tristan, Jonathan Protzenko

However, I'm specifically interested in feedback to the following items which I encourage you refer to in your answer:

  1. What is your use case? Can you point me to reasonably self-contained example code?
  2. We need to figure out what "arbitrary" monadic code means. For example, my prototype would currently support any kind of transformer stack that can be represented as StateT/ExceptT (plus ReaderT/WriterT and equivalents through StateT) over Id/IO. (Behavior of runtime IO functions will be axiomatised.) Is this sufficient for your use case? I'm asking because this plan does not currently include a Separation Logic to reason about IO.Ref or ST.Ref, and I'm yet unsure how much of a breaking change it would be to add this capability after the fact.
  3. The logic will include tactics on par with established Rocq frameworks such as CFML (of Separation Logic Foundations fame) or Iris. Good integration with the upcoming grind tactic is a given. Any other frameworks I should take a look at?
  4. My focus so far is on unary program logic. I have heard at least one voice who want to work in a relational program logic (relating multiple program runs with different preconditions) for cryptographic proofs à la SSProve. Does your use case require a relational program logic?
  5. Any Related Work I should look at? I am aware of F*, Dafny, Verus, the SatisfiesM experiment from Batteries, the Rocq frameworks above and the literature on Dijkstra monads.

Thanks!

Mario Carneiro (Feb 14 2025 at 12:57):

Sebastian Graf said:

  1. The logic will include tactics on par with established Rocq frameworks such as CFML (of Separation Logic Foundations fame) or Iris. Good integration with the upcoming grind tactic is a given. Any other frameworks I should take a look at?

I assume you are aware of https://github.com/leanprover-community/iris-lean ? It is starting to gain momentum with some of the people working on Iris, so while it's currently not really ready for use it may be in a better state once your work is getting closer to completion. Ideally these two frameworks will be able to work together seamlessly, and we should coordinate as needed to make sure this is the case

Sebastian Graf (Feb 14 2025 at 15:17):

I would love to support that effort in some form. I could imagine the following definition for triple in terms of HeapProp (could probably be generalised to any BI):

 def triple (x : M α) (P : HeapProp VAL) (Q : α  HeapProp VAL) :=
    (P -⋆ (OBSERVE x) Q)

This is the interpretation suggested on p141 in the Iris lecture notes.
But what is M? What is OBSERVE? What is VAL?

Provided that a user can answer these questions for their domain, I have no doubts that they will be able to use the framework I have prototyped.

I am however doubtful whether this will be useful to verify code using IO.Refs.
For example, even if we take M := IO and postulate OBSERVE as well as the structural rules of the logic, it is still quite difficult to find a VAL that lives in Type 0.
I have played around with using VAL := Σ α, α (because we want to allocate IO.Refs for any kind of α : Type), but that lives in Type 1. A potential shortcoming of my prototype is that this universe bump requires several ULifts in pure instantiations where we do not even need separation logic. Frankly, this is unacceptable.

I tried the alternative of indexing all computations over a heap context Γ : Nat -> Type but stopped when I realized that I would need to sacrifice the functional dependency m -> stack in doing so, because Γ occurs in the instantiation of stack. That turned out to be a no-go as well, because it greatly regresses type inference for all other instances.

Long story short: I would be more than happy to anticipate support for future Separation Logics, however that support must not come at a loss of convenience for other instances of the framework.

Shreyas Srinivas (Feb 14 2025 at 15:33):

Sebastian Graf said:

Fellow Lean users,

As announced in the Lean FRO meeting at Lean Together 2025, I'm working on behalf of the Lean FRO on a Floyd/Hoare-style program logic framework to verify "arbitrary" monadic code written in Lean.
This thread is meant to collect requirements from you!

Scope: "Is this work relevant to my use case?"

TLDR; this work could be relevant to you if you want to verify code written using Lean's do-notation.

Let me clarify what "program logic for monadic code" means by giving a sketch of the definition of Hoare Triple I have in mind:

def M α := ...
instance : Monad M where ...
def triple (x : M α) (pre : $states  Prop) (post : α  $states  Prop) : Prop := ...

(Find here the exact definition I'm currently working with, but the above is close enough to give you an idea.)
The intention is that any user may provide a definition of M and triple (plus structural rules) and then verify programs in M building on an extensive program logic with good automation and UX.

In addition to scalable extrinsic proofs, the vision is to allow convenient intrinsic F*/Dafny/Verus-style proofs embedded into do blocks by elaboration into the extrinsic framework.

Note that although M α is "semantics of a program in Lean", you should be able to use this framework for any language with a denotational semantics into a monad.

This should include any strongly normalizing object language such as System F, but exclude untyped lambda calculus, System F with recursive types or mutable references (finding monadic semantic domains for the latter is an active research problem).

Request for feedback

I am interested in gathering any kind of feedback (please keep focus on semantics rather than syntax that is likely to change :smile:), or people I should reach out to that are not yet in the following list:

Jeremy Avigad, Mario Carneiro, Wojciech Nawrocki, John Tristan, Jonathan Protzenko

However, I'm specifically interested in feedback to the following items which I encourage you refer to in your answer:

  1. What is your use case?
  2. We need to figure out what "arbitrary" monadic code means. For example, my prototype would currently support any kind of transformer stack that can be represented as StateT/ExceptT (plus ReaderT/WriterT and equivalents through StateT) over Id/IO. (Behavior of runtime IO functions will be axiomatised.) Is this sufficient for your use case? I'm asking because this plan does not currently include a Separation Logic to reason about IO.Ref or ST.Ref, and I'm yet unsure how much of a breaking change it would be to add this capability after the fact.
  3. The logic will include tactics on par with established Rocq frameworks such as CFML (of Separation Logic Foundations fame) or Iris. Good integration with the upcoming grind tactic is a given. Any other frameworks I should take a look at?
  4. My focus so far is on unary program logic. I have heard at least one voice who want to work in a relational program logic (relating multiple program runs with different preconditions) for cryptographic proofs à la SSProve. Does your use case require a relational program logic?
  5. Any Related Work I should look at? I am aware of F*, Dafny, Verus, the SatisfiesM experiment from Batteries, the Rocq frameworks above and the literature on Dijkstra monads.

Thanks!

I only know Iris and have a passing experience with Satisfies and Dafny. The thing that annoyed me in Satisfies is how difficult it is to extract the guarantees from inside the satisfies monad and relate it to a specific input variable in a proof for a function outside the monadic context. I also have a question, which is that we currently use the Id monad for proofs. Is it a good idea to add support for a monad with a heap structure for use with separation logics?

Bas Spitters (Feb 14 2025 at 15:48):

@Sebastian Graf Are you aware of the Dijkstra monad approach. I know there are other groups trying to implement it in Lean.
https://arxiv.org/abs/1903.01237

We've used it successfully for probabilistic relational Hoare logic in Coq: E.g. https://github.com/SSProve/ssprove

Sebastian Graf (Feb 14 2025 at 16:57):

Shreyas Srinivas said:

I only know Iris and have a passing experience with Satisfies and Dafny. The thing that annoyed me in Satisfies is how difficult it is to extract the guarantees from inside the satisfies monad and relate it to a specific input variable in a proof for a function outside the monadic context.

What is appealing about definitions such as SatisfiesMis their applicability across different monads. Unfortunately we were unable to prove basic properties such as conjunction (SatisfiesM x p -> SatisfiesM x q -> SatisfiesM x (p \and q)) with it. Then we tinkered with (the weaker)

def SatisfiesM {m : Type u  Type v} {α} [Monad m] (x : m α) : (α  Prop)  Prop :=
  fun p =>  {β} {f g : α  m β}, ( a, p a  f a = g a)  x >>= f = x >>= g

but had trouble defining infinite conjunction (i.e., the logic's forall) in terms of it. That was when I gave up on this approach and had a look at encodings of triple based on effect observations/Dijkstra monads (see my next post).

Shreyas Srinivas said:

I also have a question, which is that we currently use the Id monad for proofs. Is it a good idea to add support for a monad with a heap structure for use with separation logics?

I'm not completely sure I understand the question. If all programs that you want to prove correct can be expressed in Id (or any number of StateT on top of that), I don't think you need separation logic. You need separation logic only if your programs have dynamic memory allocation, because you need a way to say, e.g., "the ST.Ref a returned by ST.mkRef is distinct from any ST.Ref b returned by ST.mkRef in the future" and to then preserve any property of a after b has been allocated. Without dynamic memory allocation, you can express this property statically in the type system as a stack of StateTs; the frame rule is an appropriate liftM.

Sebastian Graf (Feb 14 2025 at 17:00):

Bas Spitters said:

Sebastian Graf Are you aware of the Dijkstra monad approach. I know there are other groups trying to implement it in Lean.
https://arxiv.org/abs/1903.01237

We've used it successfully for probabilistic relational Hoare logic in Coq: E.g. https://github.com/SSProve/ssprove

Yes, I am. In fact, until this week I had based my prototype framework around effect Observations defined by

class Observation (m : Type u  Type v) (w : outParam (Type u  Type x)) [Monad m] [{α}, Preorder (w α)] extends MonadOrdered w where
  observe : m α  w α
  pure_pure : observe (Pure.pure a) = Pure.pure a
  bind_bind (x : m α) (f : α  m β) : observe (x >>= f) = observe x >>= (fun a => observe (f a))

You could define def triple x P Q := P ≤ x.observe Q in terms of this type class, and that would work well for most instances (except separation logic, where you want persistent separating implication as alluded to in my previous post instead of entailment).

Again, what is important for the planned framework is that it is maximally convenient for the "StateT/ExceptT on top of Id" use case. Observation/Dijkstra monads allow for a polymorphic monad stack, which is nice.
However, when I tried to actually prove stuff about monad-polymorphic functions such as forIn or foldlM, I noticed that it is incredibly difficult to do so. Here is an example for forIn, allowing the user to supply a loop invariant. (I have successfully applied this lemma in my experiments, so it is useful.) Mind my difficulty expressing pre and post conditions such as hpre and hstepin this framework:

theorem Observation.forIn_list {α β} {m : Type u  Type v} {w : Type u  Type x}
  [Monad m] [{α}, Preorder (w α)] [obs : Observation m w]
  {xs : List α} {init : β} {f : α  β  m (ForInStep β)}
  (inv : List α  w β)
  (hpre : pure init  inv xs)
  (hstep :  hd tl,
      (inv (hd :: tl) >>= fun b => obs.observe (f hd b))  .yield <$> inv tl
     (inv (hd :: tl) >>= fun b => obs.observe (f hd b))  .done  <$> inv []) :
    obs.observe (forIn xs init f)  inv [] := by
    open Observation MonadOrdered in
    -- Intuition: inv encapsulates the effects of looping over a prefix of xs (and gets passed the suffix)
    -- The induction hypothesis is:
    let prog suffix := inv suffix >>= fun b => obs.observe (forIn suffix b f)
    suffices hind : prog xs  prog [] by
      -- This is because the predicate transformer is stronger (≤) than the invariant,
      -- and the longer the suffix passed to `prog`, the more we rely on predicate transformer.
      -- Conversely, the shorter the suffix, the more we rely on the invariant summarizing the effects of looping over a prefix.
      have : LawfulMonad w := inferInstance
      calc obs.observe (forIn xs init f)
        _ = pure init >>= fun b => obs.observe (forIn xs b f) := by simp only [pure_bind] -- only [List.forIn_nil, obs.pure_pure, bind_pure]
        -- For the after case (xs=[]), we have a lower bound because `forIn [] b` reduces to `pure b`
        _  inv xs >>= fun b => obs.observe (forIn xs b f) := bind_mono hpre (by rfl)
        -- For the initial case (full xs), we have an upper bound via hpre
        _  inv [] >>= fun b => obs.observe (forIn [] b f) := hind
        _ = inv [] := by simp
    -- Now prove hind : prog xs ≤ prog [] by induction
    clear hpre -- not needed any longer and would need to be generalized
    induction xs
    case nil => simp_all only [List.nil_append, le_refl]
    case cons hd tl h =>
      simp only [prog, List.forIn_cons, Observation.bind_bind, bind_assoc]
      cases hstep hd tl
      case inl hstep =>
        apply LE.le.trans _ h
        simp only [prog, List.forIn_cons, Observation.bind_bind, bind_assoc]
        apply LE.le.trans (MonadOrdered.bind_mono hstep (by rfl))
        simp only [bind_map_left, le_refl]
      case inr hstep =>
        simp only [prog, List.forIn_cons, List.forIn_nil, Observation.bind_bind, bind_assoc]
        apply LE.le.trans (MonadOrdered.bind_mono hstep (by rfl))
        simp only [bind_map_left, le_refl, prog]

Note

  • pure init ≤ inv xs expresses that init satisfies the precondition inv xs. Although I came up with this after long and hard thinking, I have no real intutition for what inv xs means. Its domain w β is huge, potentially encompassing all kinds of predicate transformers.
  • In hstep, I want to express that the result of the loop iteration f hd b satisfies inv [] b' when it is .done b', and inv xs b' when it is .yield b'. This is incredibly difficult to state and requires a disjunction. The disjunction means hstep is going to be hard to automate for the user of the lemma.
  • While I could have required ∀{α}, CompleteLattice (w α) instead and formulated hstep as (inv (hd :: tl) >>= fun b => obs.observe (f hd b)) ≤ (.yield <$> inv tl ⊔ .done <$> inv []) instead, this would not helped much with the automation problem.

So I found myself wondering whether we could find a way to expose the means to formulate proper pre and postconditions. In doing so I came up with the framework I propose and realized that we do not need Observation at all. Here's the equivalent lemma in the framework I propose:

theorem Triple.forIn_list {α β} {m : Type  Type}
  [Monad m] [LawfulMonad m] [MonadTriple m stack] [LawfulMonadTriple m stack]
  {xs : List α} {init : β} {f : α  β  m (ForInStep β)}
  (inv : List α  β  PreCond stack Prop)
  (hstep :  hd tl b,
      inv (hd :: tl) b
      (f hd b)
      r | match r with | .yield b' => inv tl b' | .done b' => inv [] b'⦄) :
  inv xs init (forIn xs init f) b' | inv [] b' := by
    induction xs generalizing init
    case nil => apply LawfulMonadTriple.triple_pure; simp
    case cons hd tl ih =>
      simp only [List.forIn_cons]
      apply LawfulMonadTriple.triple_bind
      case hx => exact hstep hd tl init
      case herror => simp
      case hf =>
        intro b
        split
        · apply LawfulMonadTriple.triple_pure; simp
        · exact ih

It is so much simpler. Maybe I'm doing something wrong or missing a trick in my encoding of effect observations. If that is indeed the case, I would like to see how to improve on Observation.forIn_list.

𝚠𝚘𝚓𝚌𝚒𝚎𝚌𝚑 𝚗𝚊𝚠𝚛𝚘𝚌𝚔𝚒 (Feb 15 2025 at 17:45):

@Sebastian Graf I haven't tried thinking about the details of your proposed approach, but it looks well thought-out. The issues we ran into in the past were really quite basic:

  1. do notation does not provide a way to state and prove loop invariants. It seems that adding this is a foundational component of your strategy.
  2. With SatisfiesM specifically, we can prove p arg a where arg is the input, and a : α is the value produced by M α, but there is no way to reason about other cases, e.g. failure in Except: we may only show that if the result is .ok v then p arg v, but not that if the result is .err e then q arg e. Perhaps building except into your TransStack accounts for this?

allow convenient intrinsic F*/Dafny/Verus-style proofs embedded into do blocks by elaboration into the extrinsic framework

I am also curious about this statement. My current model of verified programming in DTT is that certain intrinsically-verified programs cannot be expressed extrinsically. Here is the idea:

def tizio : Id Nat := do
  for i in [0:n] with invariant P do -- imagined syntax
    ..
  have h : P (n - 1) := ..
  let elem := array[i]'(by simp [h])

In this program, the executable code includes an array-bound proof term derived from proofs about earlier parts of the program. I don't immediately see how this could be expressed extrinisically (i.e., write the code first and the prove it correct as a separate theorem), and I suspect that it cannot.

Henrik Böving (Feb 15 2025 at 18:16):

Hypothetically speaking it should be possibel to generate a function tizio_aux that contains the old body of tizio but also a proof argument that is abstracted over the necessary hypotheses and plugs these hypothesis in at the appropriate array spots, then redefine tizio in terms of tizio_aux <your_external_proof_term> right? I'm not saying that this is convenient but at least a technical possibility and potentially automatable.

Devon Tuma (Feb 15 2025 at 19:14):

Me and @Quang Dao would have a definite use case for this in verifying cryptographic code (see here). Having a standardized program logic would be very useful to simplify proofs that currently require ad-hoc reasoning.

Specifically to your questions, in our use case we've been writing algorithms using a relatively simple monad stack, the main definition being a type to represent computations with oracles, which we use for uniform selection, random oracles, security game oracles, etc. More explicitly our main stack is:

inductive FreeMonad (f : Type u  Type v) (α : Type w) : Type (max (u + 1) v w)
  | protected pure (x : α) : FreeMonad f α
  | roll {β : Type u} (x : f β) (r : β  FreeMonad f α) : FreeMonad f α

-- Map from indexing set to domain/range type of that oracle
def OracleSpec (ι : Type u) : Type (max u (v + 1)) :=
  (i : ι)  Type v × Type v

-- Functor to represent queries with a given output type
inductive OracleQuery {ι : Type u} (spec : OracleSpec.{u,v} ι) :
    Type v  Type (max u v)
  | query (i : ι) (t : spec.domain i) : OracleQuery spec (spec.range i)

-- Most definitions are written in terms of this monad
def OracleComp {ι : Type u} (spec : OracleSpec.{u,v} ι) :
    Type w  Type (max u (v + 1) w) :=
  OptionT (FreeMonad (OracleQuery spec))

Operational semantics are essentially just the universal properties of OptionT/FreeMonad, letting us lift a mapping from OracleQuery to a new monad m to one from OracleComp to m. The monads you listed seem to cover all our use cases of this:

structure QueryImpl (spec : OracleSpec ι) (m : Type u  Type v) where
  impl {α : Type u} (q : OracleQuery spec α) : m α

def simulateQ [Monad m] [Alternative m]
    (so : QueryImpl spec m) : OracleComp spec α  m α :=
  OptionT.mapM' (FreeMonad.mapM' m so.impl)

-- Example responding to repeated queries with a cache using StateT
def cachingOracle [spec.DecidableEq] :
    QueryImpl spec (StateT spec.QueryCache (OracleComp spec)) where
  impl | query i t => do match ( get) i t with
    | Option.some u => return u
    | Option.none => QueryCache.queryFresh i t

Denotational semantics are just the case of mapping this into the PMF monad (actually OptionT PMF to represent a spmf):

noncomputable def evalDist [spec.FiniteRange] {α : Type u} :
    OracleComp spec α  OptionT PMF α :=
  simulateQ fun | query _ _ => OptionT.lift (PMF.uniformOfFintype _) oa

noncomputable def probOutput (oa : OracleComp spec α) (x : α) : 0 :=
  (evalDist oa).run (some x)

I do think a relational program logic would be nice for our case, although the unary case would also be helpful.

Michael Sammler (Feb 17 2025 at 07:44):

Hi, this sounds quite interesting. I am currently not working with Lean, but I am quite interested in using it for Iris-based verification in the future.
We recently presented a modular method for building program logics at POPL: https://dl.acm.org/doi/pdf/10.1145/3704847
Our approach is based on the free monad (or more concretely its implementation in Rocq in the form of interaction trees) and it allows one to modularly combine different effects in a program and reuse proofs. In particular, I think this approach provides a higher degree of modularity and reuse than an Dijkstra-monad-based approach.
Our target was to modularly build program logics for Iris and I would expect that one wants to use a similar approach if Iris gets ported to Lean. But I have not thought about general monadic programs, so maybe the tradeoffs are a bit different there.
Maybe some ideas from that paper could also be useful for what you have in mind. Let me know if you have any questions about this paper. I would be happy to talk more about this.

Sebastian Graf (Feb 17 2025 at 09:24):

Wojciech Nawrocki said:

Sebastian Graf I haven't tried thinking about the details of your proposed approach, but it looks well thought-out. The issues we ran into in the past were really quite basic:

  1. do notation does not provide a way to state and prove loop invariants. It seems that adding this is a foundational component of your strategy.

Thanks! Indeed. Stating and proving loop invariants is just "the specification lemma" for forIn that I called Triple.forIn_list above.

  1. With SatisfiesM specifically, we can prove p arg a where arg is the input, and a : α is the value produced by M α, but there is no way to reason about other cases, e.g. failure in Except: we may only show that if the result is .ok v then p arg v, but not that if the result is .err e then q arg e. Perhaps building except into your TransStack accounts for this?

Yes; the PostCond type contains a failure "continuation" for every ExceptT in the stack. That said, this is not the only viable design. I think that if the one-specification-monad-to-rule-them-all SatisfiesM would have worked out, we could encode the same information syntactically rather than semantically via the semantics of tryCatch.
So something like SatisfiesM (tryCatch (.ok <$> e) (.error \o pure)) (fun | .ok a => p a | .error e => q e) would work (tryCatch is used to "interpret" ExceptT here).

allow convenient intrinsic F*/Dafny/Verus-style proofs embedded into do blocks by elaboration into the extrinsic framework

I am also curious about this statement. My current model of verified programming in DTT is that certain intrinsically-verified programs cannot be expressed extrinsically. Here is the idea:

def tizio : Id Nat := do
  for i in [0:n] with invariant P do -- imagined syntax
    ..
  have h : P (n - 1) := ..
  let elem := array[i]'(by simp [h])

In this program, the executable code includes an array-bound proof term derived from proofs about earlier parts of the program. I don't immediately see how this could be expressed extrinisically (i.e., write the code first and the prove it correct as a separate theorem), and I suspect that it cannot.

By "intrinsic" above I was referring to taking a function that would type-check all on its own, and then augmenting its definition with requires, ensures, assert and invariant syntax to specify a contract intrinsically. I would hope that the ensuing verification conditions (predicate entailments) would be discharged by e.g. grind for maximal convenience.
This would be equivalent to implementing a post-processing step that extracts requires, ensures, asserts etc. and turns them into an extrinsic specification that can be proven entirely by a fixed set of tactics.

In your example, the function wouldn't type-check without the array-bounds proof, so that would not fall within the "intrinsic specification" use case.
On a related note, while True and partial def is unlikely to be supported. Perhaps the definitions provided by the partial_fixpoint work can be utilized, but that is yet hard to anticipate.

Sebastian Graf (Feb 17 2025 at 09:47):

Devon Tuma said:

Me and Quang Dao would have a definite use case for this in verifying cryptographic code (see here). Having a standardized program logic would be very useful to simplify proofs that currently require ad-hoc reasoning.

Specifically to your questions, in our use case we've been writing algorithms using a relatively simple monad stack, the main definition being a type to represent computations with oracles, which we use for uniform selection, random oracles, security game oracles, etc.

Indeed, that should work nicely. Can you give an example of a relational property you would like to see specified? That is, how would triple need to be generalized for your relational use case? Would something like this help? (NB: $states → Prop is like a set of tupleOf $states)

def reltriple (x : M α) (pre : ($states  Prop)   Prop) (post : (α  $states  Prop)   Prop) : Prop := ...

Example for a pre and postcondition modelling non-interference on StateT {secret : Bool, public : Nat} Id:

def pre := fun S =>  σ₁ σ₂, S σ₁  S σ₂  σ₁.secret = true  σ₂.secret = false  σ₁.public = σ₂.public
def post := fun R =>  α₁ α₂ σ₁ σ₂, R α₁ σ₁  R α₂ σ₂  α₁ = α₂  σ₁.public = σ₂.public

(The interpretation of these predicates in reltriple must be monotonic.)

The problem with the relational case is that the design space is so large.

  • I think the definition reltriple above characterises many 2-properties (of the same program), but some relational properties want to relate two different programs with each other, and the above does not allow for that.
  • There is also the question of whether one wants to relate a code path that throws a particular exception ε with a successful execution returning an α. This affects the encoding of PostCond.

I think we might want to hold back on this feature until we have more information on what kind of relational properties people want to express.

Sebastian Graf (Feb 17 2025 at 10:49):

Michael Sammler said:

Hi, this sounds quite interesting. I am currently not working with Lean, but I am quite interested in using it for Iris-based verification in the future.
We recently presented a modular method for building program logics at POPL: https://dl.acm.org/doi/pdf/10.1145/3704847
Our approach is based on the free monad (or more concretely its implementation in Rocq in the form of interaction trees) and it allows one to modularly combine different effects in a program and reuse proofs. In particular, I think this approach provides a higher degree of modularity and reuse than an Dijkstra-monad-based approach.

Great work! That looks like a neat approach if you want to reason about your own bespoke object language.
Though a small nit: Judging from the non-structural recursion in the application case in Fig. 9 I think I would have referred to the approach as "coinductive big-step" rather than "denotational".
On the other hand, this makes me realize that my use "denotational" in the OP could perhaps be weakened to "coinductive big-step". That said, the program logic will only be useful to reason about calls to the interpreter, and not so much for reasoning about expressions in the object language; that would require bespoke specification lemmas and perhaps an entire tactic stack on top. At this point, I'm unsure whether the program logic is so useful at all for this use case...

Just this weekend I had a look at the FreeSpec library which is based on free monads, but they don't really apply it to a language with an interesting semantics...

Regarding Iris: the goal of the planned framework is to provide a program logic for reasoning about programs written in Lean as shallowly-embedded object language. It is not currently a goal to be forward compatible with every future project, especially not if it is not about verifying programs written in Lean.
The program logic of Iris (in contrast to just its logic of bunched implications implemented in iris-lean) in particular is geared towards operational semantics of "deeply-embedded" languages.

I think the main limiting factor when it comes to reusing the planned framework for custom, non-standard monads is that the shape of pre and postconditions must be encodable in a fixed language that is currently targeted towards stacks of StateT s and ExceptT ε. I'm not sure if this is expressive enough for general free monads or interaction trees.

Michael Sammler (Feb 17 2025 at 11:22):

Thanks for your thoughts on this!

Though a small nit: Judging from the non-structural recursion in the application case in Fig. 9 I think I would have referred to the approach as "coinductive big-step" rather than "denotational".

I guess it depends on the view point and we also had some discussion about whether to use "denotational" or some other term when writing the paper. We landed on "denotational" since I would argue that one can view InteractionTrees as a semantic domain that we denote our expressions into, but one can argue about it. (In fact, I would see the non-structural recursion you mention as an argument for calling the approach denotational since we denote recursion into a recursion combinator of the semantic domain instead of doing the fixpoint ourselves, but I guess this is a matter of perspective.)

That said, the program logic will only be useful to reason about calls to the interpreter, and not so much for reasoning about expressions in the object language; that would require bespoke specification lemmas and perhaps an entire tactic stack on top.

I should clarify: The Program Logics a la Carte paper has two parts that can be separated: First, it provides a program logics for effectful programs (in the form of interaction trees) that provides nice modular reasoning principles. Second, it shows how use can use this program logic to build programs logics for object-level languages for it. For your purposes, only the first part is really relevant and I think you can ignore all parts about the object level languages. (This might be interesting to look at in the future, but probably not in the initial step.)

Just this weekend I had a look at the FreeSpec library which is based on free monads, but they don't really apply it to a language with an interesting semantics...

At least from my perspective, interaction trees have superseeded the FreeSpec library as the canonical implementation of free monads in Rocq, so I would look at them instead. There have been quite a few interesting semantics formalized using interaction trees at this point (see e.g. the paper).

The program logic of Iris (in contrast to just its logic of bunched implications implemented in iris-lean) in particular is geared towards operational semantics of "deeply-embedded" languages.

Just to make sure that we are one the same page: The Program Logics a la Carte paper proposes to replace this program logic part of Iris that is geared towards operational semantics of deeply-embedded languages with the denotational approach to interaction trees. I think the denotational way has significant advantages (in particular wrt. modularity) and if Iris gets ported to Lean, it might make sense to port the denotation approach to build a program logic instead of the old operational semantics based one.

I think the main limiting factor when it comes to reusing the planned framework for custom, non-standard monads is that the shape of pre and postconditions must be encodable in a fixed language that is currently targeted towards stacks of StateT s and ExceptT ε. I'm not sure if this is expressive enough for general free monads or interaction trees.

This is actually something something that surprised me when working on that paper, but the approach we use there and which is not too different from what you are proposing, scales to a surprisingly large number of effects. In the paper, we support the following effects among others: state, non-determinism, undefined behavior, aborting, concurrency, I/O with trace properties, non-termination. However, some of these effects rely on the expressiveness of Iris (e.g. non-termination relies on step-indexing), but maybe it is possible to achieve a similar large set of effects for your approach.

It is not currently a goal to be forward compatible with every future project, especially not if it is not about verifying programs written in Lean.

This makes a lot of sense and I fully agree with this goal. To be clear, I don't want to argue that it makes sense to directly use the approach from the paper, instead I just want to share my perspective as someone who has worked on similar problems in the past and hope that this is useful for finding a solution that works well for Lean.

Sebastian Graf (Feb 17 2025 at 11:46):

Thanks again for your reply! I will definitely have a closer look at the first part of your paper then.
It is pretty likely that interaction trees are out of scope as an out-of-the-box instance of MonadTriple because there currently is no formalisation of interaction trees in Lean's standard library. However, I would hope that your paper will guide me to a framework which could be instantiated for a user-written interaction tree library.

Son Ho (Feb 17 2025 at 11:52):

Hi, this would be really useful for the Aeneas project that I'm working on (in collaboration with Jonathan Protzenko and others) where we do extrinsic proofs about pure models generated from Rust code and which live in an error monad:

inductive Result (α : Type u) where
  | ok (v: α): Result α
  | fail (e: Error): Result α
  | div -- for partial_fixpoint

We had to put a fair amount of work into writing a progress tactic to do proofs in a Hoare-logic style and which is a bit equivalent to CFML's xapp, though in our case we are not concerned about framing separation logic specifications but rather about automatically discharging preconditions.

For now, because we only work with one monad and do not specify failing or divergent executions (we prove total correctness theorems) we write our theorems in a very specific shape (progress is mostly syntax driven - I would like something more general of course). For instance, this is the theorem to reason about addition about U32 machine integers and that progress knows how to manipulate (it automatically looks it up whenever seeing a call to +):

@[progress] -- saves the theorem in the relevant database
theorem U32.add_spec {x y : U32}
  (hmax : (x : Int) + y  U32.max) : -- precondition
  -- there is generally a list of preconditions, and which live in different theories (i.e., require different solvers)
   z, x + y = ok z  -- there are usually several existentially quantified variables
  (z : Int) = x + y -- post-condition
  := by ....

This is a specific class of proofs, but quite informative of what we need in practice, I believe, and I have a few examples of proofs that you can have a look at. For instance, there is the tutorial which showcases proofs about list-manipulating functions and a (minimal) big num library. We also have proofs about an AVL tree and a hashmap (the latter one is quite old and needs deep cleaning). I'm currently working on the verification of a Rust implementation of the ML-KEM post-quantum cryptographic primitive (not public yet).

I believe there are quite a few things that need improvement and that would benefit from a more systematic approach.

  1. For now, we only work with an error monad but in the future we will have to work with code which lives in more expressive monads to reason about some form of aliasing, I/O, or concurrency. An extremely useful feature would be a way of seamlessly switching between monads, for instance by having a way of lifting theorems (from, say, an error monad to a state error monad). Also, I mentioned that we currently only prove total correctness theorems, but users of the framework may also want to prove partial correctness theorems. It would be nice to have a way of easily switching between partial correctness and total correctness proofs, in particular by factoring theorems out. For instance, it should be possible to write a correctness theorem about + which specifies both the ok and fail cases, and use it in both partial and total correctness proofs, in the latter case by using the specification of fail as a precondition that has to be disproved.

  2. It is currently not extremely convenient to reason about sub-expressions; the problem of loops has been mentioned above, but I also need a way of easily reasoning about expressions like this (today progress is good at reasoning about function calls):

let i1  if b then ok i else i + 1
...
  1. (those are maybe more of technical details at this stage, but I think it's important to mention them to inform about the proof experience) so far, most of the work has been put into writing the progress tactic so that the proofs are smooth, in particular when it comes to discharging pre-conditions. This tactic uses several simple, yet extremely useful tactics (simp, a tactic which looks a bit like assumption but which uses more clever heuristics to instantiate meta-variables, and a solver based on omega) - I can share a document about this if you are interested. Those tactics are currently hardcoded, but in the future we want them to be customizable: one non-obvious problem here is how to quickly figure out which solver should be applied on which precondition (we can't just try them one after the other as they tend to be expensive). It can also happen that we want to switch between different flavours of "spec" theorems depending on the program we verify. For instance I mentioned the problem of partial vs total correctness above. Another example is that in most situations today we want to use the + theorem I showed above which specifies addition in terms of Int, but when reasoning about a function performing bitwise operations we want to systematically lookup theorems which specify arithmetic operations in terms of bitvector operations, etc.

Joachim Breitner (Feb 17 2025 at 13:01):

Son Ho said:

do not specify failing or divergent executions (we prove total correctness theorems) we

You man “partial correctness”, right?

Son Ho (Feb 17 2025 at 13:13):

Joachim Breitner said:

You man “partial correctness”, right?

I was slightly imprecise. I believe a partial correctness theorem would go like this:

forall x y, f x = ok y -> post y

In our case we prove panic freedom which means our theorems are total correctness theorems I think:

forall x, pre x -> exists y, f x = ok y /\ post y

But because here fail actually models a panic in Rust, and as it is possible to recover from panic in some situations, we may want to prove theorems of the following shape:

forall x,
match f x with
| ok y -> post y
| fail -> not (pre x) -- something like this probably?

Joachim Breitner (Feb 17 2025 at 14:21):

Ah, yes, that looks like total correctess to me. I misread the prose then.

Sebastian Graf (Feb 17 2025 at 15:10):

Thanks, Son Ho, your pointer to Aeneas' progress tactic and its examples will be incredibly valuable. I think I'm already starting to encounter similar challenges in the examples I'm looking at (without a tactic such as progress). The Result type seems entirely reasonable to instantiate triple for.

more expressive monads to reason about some form of aliasing, I/O, or concurrency

As alluded to in #Program verification > Monadic program logic: Request for feedback @ 💬, I have played around with an encoding of separation logic, but realized that you would need to somehow restrict the kinds of computations that end up in the heap to a closed universe, so that its entries can have Type 0 rather than Type 1 for the general Σ α, α. For this reason, I do not yet have plans to include a monad transformer adding a separation logic-based heap out of the box, but since the syntax of predicates is the same as for StateT, we could nurture such a transformer in a separate library first. I do not yet know how to model concurrency in this framework appropriately.

This tactic uses several simple, yet extremely useful tactics [...]. I can share a document about this if you are interested.

I would be very interested in that. It is something that I have been wondering as well, but put on my backlog because it is impossible to solve without seeing concrete instances, which I hopefully will see soon.

Son Ho (Feb 17 2025 at 15:22):

Sebastian Graf said:

I would be very interested in that.

I wrote this down in my PhD dissertation, chapter 14.3.3 "Hoare-Logic Style Proofs and the progress Tactic". I will be very happy to provide additional feedback if you have questions.

Son Ho (Feb 17 2025 at 15:35):

Sebastian Graf said:

but since the syntax of predicates is the same as for StateT, we could nurture such a transformer in a separate library first

Yes, at this stage I'm also not sure what's the best way to proceed and what I will actually need either. But I intend to study the problem of verifying stateful code (i.e., code which I can't model only with an error monad) later this year: this should provide interesting insights about how to combine these kinds of reasonings.

Mario Carneiro (Feb 17 2025 at 23:28):

Sebastian Graf said:

Regarding Iris: the goal of the planned framework is to provide a program logic for reasoning about programs written in Lean as shallowly-embedded object language. It is not currently a goal to be forward compatible with every future project, especially not if it is not about verifying programs written in Lean.
The program logic of Iris (in contrast to just its logic of bunched implications implemented in iris-lean) in particular is geared towards operational semantics of "deeply-embedded" languages.

I think the main limiting factor when it comes to reusing the planned framework for custom, non-standard monads is that the shape of pre and postconditions must be encodable in a fixed language that is currently targeted towards stacks of StateT s and ExceptT ε. I'm not sure if this is expressive enough for general free monads or interaction trees.

My inclination regarding this is that you should take care to separate the syntax of do notation programs with invariants from the interpretation of those programs and specifications wrt lean shallow embedding. I agree that lean shallow embedded programs should be the main focus, but it should be possible to leverage some parts of the frontend to do similar things even if the specification language is something fancy.

This is more or less what I mean by "working together with iris-lean": it should be possible for iris-lean to present an interface that feels native to the greatest extent possible. I don't want to tie your hands too much here, but please don't do like do did and make a completely non-extensible system (and leave it unmaintained for a long time). I'm sure you weren't really thinking of doing so anyway, but I want to stress that extensibility and pluggability of the architecture is very much a use case that matters to me.

Wrenna Robson (Mar 20 2025 at 18:03):

@Sebastian Graf I just found this thread. I am literally right now trying to verify some code that uses do notation and running into the issue that I don't have for loop pre- and post-conditions! (In particular, it makes working with mutable arrays where I want to prove facts about their size at the end of the loop quite tricky.)

Wrenna Robson (Mar 20 2025 at 18:04):

In general I am very interested in this work.

Sebastian Graf (Mar 20 2025 at 18:17):

You could give my prototype a try; here's an example for proving a loop-based fibonacci function correct: https://github.com/sgraf812/mpl/blob/682c1f7ef8dfee8f1b565338c5a8f3c80d6442f1/Tests/Toy.lean#L284.
Dislaimer: Not "ready for production" yet.

What kind of code are we talking about? What kind of mutable arrays in particular? Is it the local kind of mutability, i.e. just let mut arr : Array Blah := ...?

Wrenna Robson (Mar 23 2025 at 07:19):

Yes, it's nothing particularly fancy. Though it uses vectors rather than arrays which makes the types more of a headache.

Sebastian Graf (Apr 25 2025 at 14:24):

I think the dust on the semantic foundations and the general UX vision for manual proofs has settled.
It's time to elicit a bit of guinea pig feedback! You can depend on mpl by adding the following requires to your lakefile.toml:

[[require]]
name = "mpl"
git = "https://github.com/sgraf812/mpl"
rev = "pre-release"

Alternatively, you can just clone the repository and play with the files in Test/, such as Test/Toy.lean.

Bear in mind that this not an official release! This library is not ready for production use. Please do not emphatically share this request. The automation is still hacky and underdeveloped and subject to change.

I would be very happy for you to import MPL, give it a go and let me know about the general proof mode UX (examples in Test/Toy.lean). I am specifically interested in your opinion on

  1. The semantic foundations. Hoare triples are encoded based on a notion of predicate transformers that is parameterized by a PostShape parameter. Batteries are included for the typical transformer primitives (StateT, ExceptT, Id, a better-behaved Idd, IO). Would you say this is enough for your use case?
  2. The Iris-like proof mode for manipulating stateful hypotheses in a pointfree way, one that does not require repeated introduction of state variables. (Check out Test/ProofMode.lean for a comprehensive set of small examples.)
  3. The prototypical syntax in the README.md that extends def-elaboration with support for intrinsic verification features (requires, ensures, invariant). It bears repeating: You will find the proof automation lacking once you type in more complicated examples; this is mostly about the general look and feel for now. You have to import MPL.Experimental to opt into that feature.

Next up on my agenda is writing an actual verification condition generator that replaces the simp-based mwp tactic and grow that in parallel with small to medium-sized verification projects. We were thinking about a congruence closure algorithm; any other ideas?

Mario Carneiro (Apr 27 2025 at 20:56):

https://github.com/sgraf812/mpl

Bas Spitters (May 01 2025 at 18:28):

@Sebastian Graf this looks nice. I have yet to get a full overview. Are you planning a write-up?
The automatic proofs of equivalence between imperative and functional code are nice.
We added them in the Rocq backend for Hax some time ago (for the state monad):
https://eprint.iacr.org/2023/185

Eric Wieser (May 02 2025 at 05:09):

It looks like your Idd would be replaced by lean4#7352

Sebastian Graf (May 02 2025 at 06:41):

Bas Spitters said:

Sebastian Graf this looks nice. I have yet to get a full overview. Are you planning a write-up?
The automatic proofs of equivalence between imperative and functional code are nice.
We added them in the Rocq backend for Hax some time ago (for the state monad):
https://eprint.iacr.org/2023/185

Thanks! Unfortunately, a write-up is not on my agenda yet; building automation currently has my full attention.
Your work on relating Hac with Jasmin programs looks interesting, thanks for sharing. One open semantic question of mpl is whether translating StateM σ α into assertions (i.e., preconditions) of the form σ -> Prop is really future-proof with a hypothestical future in which mpl builds on separation logic, for example to verify code using ST.Ref. Then it seems that having σ -> HProp (where HProp ~= Heap -> Prop) seems like a redundant encoding if StateM σ m can be encoded like let mut in your pure functional Hac translation, that is, allocating a "static"/"global" heap location for each StateT σ wrapper in the monad stack. This is also what AutoCorrode does (see for example swap_ref_contract). However, this all hinges on whether future mpl will build on separation logic, which would sacrifice a lot of simplicity of the current approach.
I wonder: You write that you have a model of the global heap. Does this model include some notion of separating conjunction in the logic? How do you cope with aliasing? Presumably you rely on let mut aliasing being syntactically apparent? Edit: Ah, 5.4 has some discussion about disjointness. Sounds quite involved...

Sebastian Graf (May 02 2025 at 06:44):

Eric Wieser said:

It looks like your Idd would be replaced by lean4#7352

Indeed! Fixing lean4#7352 would make Idd unnecessary. It's somewhere on my backlog to take over your patch unless you merge it first :)

Bas Spitters (May 04 2025 at 12:39):

@Sebastian Graf The equivalence proof between imperative and functional code, is before the connection with jasmin, although it extends to it. We haven't considered the connection to separation logic yet (only superficially), as we are mainly concerned with cryptographic code, where this seems less relevant.

Markus de Medeiros (Jun 20 2025 at 05:36):

re. question 5 in your initial post, you might want to look at PulseCore. I haven't been following either project very closely so I'm not sure how similar they are at technical level, but it's an iris-like logic for shallowly-embedded monadic programs (in F*) which seems at least adjacent to your work.

Sebastian Graf (Jun 23 2025 at 10:11):

Indeed, their use of indirection theory (which I had not been aware of) to define step-indexed heaps and predicates is very interesting. Much less conceptual overhead compared to Iris' COFE setup, it appears. Would be a great fit for verifying programs in a monad that has an inherent or synthetsized notion of time/step-indexing. In my Lean experiments so far, I haven't seen any. However, if one were to define separation logic for ST on top of MPL, indirection theory sounds like a good fit.

Shreyas Srinivas (Jun 29 2025 at 18:09):

@Sebastian Graf : I want to try out mpl, but before that I would like to ask: does it support the notion of ghost states like iris. If so, is there a specific example I should look at in the repo?

Shreyas Srinivas (Jun 29 2025 at 18:10):

Here I just mean variables that are not accessible in the code, only in the verification conditions.

Sebastian Graf (Jun 30 2025 at 07:31):

Shreyas Srinivas said:

Sebastian Graf : I want to try out mpl, but before that I would like to ask: does it support the notion of ghost states like iris. If so, is there a specific example I should look at in the repo?

I archived mpl on Friday after I merged my work into Core. The easiest way is to pick a nightly after 06-27 and import Std.Tactic.Do, then state the specs you want and use mvcgen to prove them. This is not an official release yet, but I'm very happy for feedback from early adopters. In the next Q, I'll be working on Markus' human-eval-lean project for dog fooding and examples.

Sebastian Graf (Jun 30 2025 at 07:53):

Re: ghost state. The framework supports ghost variables, that is, schematic variables of the specification {{P}} prog {{Q}} that occur only in P or Q, but not in the program prog. This spec

theorem mkFreshNat_spec [Monad m] [WPMonad m sh] :
  ⌜#fst = n  #snd = o
  (mkFreshNat : StateT (Nat × Nat) m Nat)
   r => r = n  #fst = n + 1  #snd = o

can be considered an example of this. It simply gives names n and o to parts of the pre-state. I think this is in line with "variables that are not accessible in the code, only in the verification conditions".

This is of course a far cry from what systems such as Iris or Why3 consider ghost state. Such advanced notions of ghost state should be modeled by a custom WP instance that reflects the notion of ghost state that is specific to your particular monad into a heap-like structure that is part of the σ in PredShape.arg σ.

So for example when your program monad has a notion of time/step-indexing, then you can define your heap model in terms of this stratification and a notion of ghost state as well. I have not attempted anything like it yet, because the main intended use case of the logic is simple StateT/ExceptT wrappers without much dynamic memory allocation and certainly no notion of a runtime-relevant "program counter" which is often present in a deep rather than shallow embedding of a program semantics.

So TLDR; if you want Iris-like "ghost state" then first we need to clarify what monad your programs run in in order to define what "ghost state" means. Then we can perhaps find a way to generalize many different kinds of monad-specific "ghost state" notions in a reusable library.

Alok Singh (Jul 01 2025 at 22:04):

How could I type this dafny program into lean with the new syntax?

method add(a: array<int>, b: array<int>) returns (res: array<int>)
requires a.Length == b.Length
ensures res.Length == a.Length
ensures forall i :: 0 <= i < a.Length ==> res[i] == a[i] + b[i]
{}

Sebastian Graf (Jul 02 2025 at 09:46):

Hmm. The idiomatic way to implement add is via zipWith and that wouldn't be monadic. You wouldn't use Hoare logic to prove it correct.

The slightly less idiomatic way would be define add in Id and use a for loop:

def add (a b : Array Int): Id (Array Int) := do
  let mut res := Array.emptyWithCapacity a.size
  for ha : i in [0:min a.size b.size] do
    have : i < min a.size b.size := by get_elem_tactic
    have : i < a.size := by grind
    have : i < b.size := by grind
    res := res.push (a[i] + b[i])
  return res

@[grind ]
theorem range_elim : List.range' s n = xs ++ i :: ys  i = s + xs.length := by
  intro h
  induction xs generalizing s n
  case nil => cases n <;> simp_all[List.range']
  case cons head tail ih =>
    cases n <;> simp[List.range'] at h
    have := ih h.2
    simp[ih h.2]
    omega

theorem add_spec (a b : Array Int) (h : a.size = b.size) :
    True
    add a b
    r =>  hr : r.size = a.size,  i (ha : i < r.size), r[i] = a[i] + b[i]⦄ := by
  mvcgen [add]
  case inv => exact  (r, is) =>  hr : r.size = is.pref.length,  i (hr : i < r.size) (ha : i < a.size), r[i] = a[i] + b[i]
  all_goals simp_all; try grind

The bounds checks are annoying; I would try to refactor add to use Vector if possible.

As you can see by the use of simp_all, I haven't yet learned enough about grind to come up with the proper annotations in the logic. By the end of the Q it should just be all_goals grind.

Markus Himmel (Jul 06 2025 at 11:54):

Shameless self-promotion: I have written a blog post about my initial experiences with the monadic verification framework: https://markushimmel.de/blog/my-first-verified-imperative-program/ Feedback is welcome.

Ching-Tsun Chou (Jul 06 2025 at 18:22):

@Markus Himmel The paper link in footnote 1 of your blog post is broken.

Joscha Mennicken (Jul 06 2025 at 18:42):

Maybe due to #general > Discussion: New lean-lang.org? :D

Markus Himmel (Jul 06 2025 at 18:44):

Ching-Tsun Chou said:

Markus Himmel The paper link in footnote 1 of your blog post is broken.

Fixed, thanks.

Ching-Tsun Chou (Jul 07 2025 at 18:55):

@Markus Himmel I have some questions about your blog post:

  1. After the mvcgen, I see 6 goals in InfoView. What exactly does the subsequence casetactic do to them?
  2. After the case, I see 5 goals which look very different from the previous 6 goals. Where do they come from?
  3. Would it be possible for the assertions in Hoare triples to be interspersed with the code? If the loop body itself has a relatively complex control flow, wouldn't the mvcgen produce many more VCs which would be hard to keep track of?

Markus Himmel (Jul 07 2025 at 19:21):

Ching-Tsun Chou said:

After the mvcgen, I see 6 goals in InfoView. What exactly does the subsequence casetactic do to them?

case focuses on the goal with the given name (which is the first goal in this case).

Ching-Tsun Chou said:

After the case, I see 5 goals which look very different from the previous 6 goals. Where do they come from?

They are the remaining goals after closing the first goal. They look different than before the case tactic because they mention the loop invariant, which appears as ?inv before the case block and is replaced by the provided loop invariant after the case block.

Ching-Tsun Chou said:

Would it be possible for the assertions in Hoare triples to be interspersed with the code?

There were some experiments with this as seen here, but you'll have to ask Sebastian Graf what the status is.

Alok Singh (Jul 07 2025 at 23:04):

https://github.com/Beneficial-AI-Foundation/NumpySpec/tree/main/NumpySpec has a bunch of autogenerated examples with the new syntax. They’re all sorried rn

Sebastian Graf (Jul 08 2025 at 06:34):

Markus Himmel said:

Ching-Tsun Chou said:

Would it be possible for the assertions in Hoare triples to be interspersed with the code?

There were some experiments with this as seen here, but you'll have to ask Sebastian Graf what the status is.

The PoC over there was promising, but we'll revamp do elaboration in Q4 and will add the syntax for invariants etc. as part of that in Q4, so you'll have to wait a bit more.

Alok Singh (Jul 08 2025 at 21:23):

Alok Singh said:

https://github.com/Beneficial-AI-Foundation/NumpySpec/tree/main/NumpySpec has a bunch of autogenerated examples with the new syntax. They’re all sorried rn

Files like https://github.com/Beneficial-AI-Foundation/NumpySpec/blob/main/NumpySpec/Numpy_Absolute.lean give the error (which doesn't show up in infoview but does on lake build)

error: NumpySpec/Numpy_Absolute.lean:26:5: type mismatch
  ∀ (i : Fin n), result[i] = x[i].abs
has type
  Prop
but is expected to have type
  Assertion PostShape.pure

Other files have the same issue, but the code seems innocuous. Same error even if I replace the absolute function with an actual impl.

Sebastian Graf (Jul 09 2025 at 06:05):

Note that Assertion PostShape.pure is defeq to Prop, which makes this error all the more puzzling.

My guess is that it is the same underlying issue as #8766 and it would be fixed by #9256. Unfortunately, I had to shelve the patch yesterday after realizing that it breaks type class inference in Mathlib. It's unlikely that we'll see this fix landed.

Try and wrap your Prop in ⌜⌝, ⌜∀ (i : Fin n), result[i] = x[i].abs⌝. That should work reliably. You will also be required to do that in the next release due to universer polymorphism changes (then we'll have Assertion PostShape.pure =?= ULift Prop)

Théophile (Jul 20 2025 at 20:00):

Hi @Sebastian Graf , thanks for the nice work on monadic reasoning! I am currently playing with your framework (especially using State monad), and have some questions (note: I am a beginner in Lean, although knowledgeable in other proof assistants).

I have 4 low-level questions, put as comments in the code present in this playground, summarized here:

  • Question 1: Is there a cool notation for PostCond.partial that I missed?
  • Question 2: sometimes the notation does not work on SPred, I have not been able to understand why (see code)
  • Question 3: is there a nice way to interact with the lemma SPred.entails_cons_intro?
  • Question 4: how to deal with spred(…) in the goal? (grind doesn't know how to deal with it)
  • Feedback 1: misleading error when using mintro without further argument

Other questions and feedback:

  • Feedback 2: the code works on the playground with "latest mathlib" and lean v4.22.0-rc3 on my computer, but have ugly type errors in the hoare triples with lean nightly on the playground?
  • Question 5: is this the implementation of a well-known concept I can read in a research paper, or is it an invention of your own? (I know about the Hoare triples / weakest preconditions, but not about this implementation of SVal / SPred / Assertion / PostCond etc)
  • Question 6: I have been grepping the lean4 repo to look for the mvcgen implementation but haven't been able to find it, where is it located?

Thanks!

Sebastian Graf (Jul 21 2025 at 09:11):

Thanks for giving Std.Do a try!

Question 1: Is there a cool notation for PostCond.partial that I missed?

None yet. Maybe you have something in mind? I think a use case would help; I feel like PostCond.partial is far less common than PostCond.total, and when you use the former you often want to specify a more refined failure condition, but I might be wrong.

Question 2: sometimes the notation does not work on SPred, I have not been able to understand why (see code)

You need to wrap it with the spred quoter. The syntax for PostCond.total does this automatically for you:

   Std.Do.PostCond.partial (fun x => spred(state_invariant  pred1 x)) 

Question 3: is there a nice way to interact with the lemma SPred.entails_cons_intro?

You can use mintro ∀s to do that! Sorry about the lack of documentation/reference manual content at the moment.

Question 4: how to deal with spred(…) in the goal? (grind doesn't know how to deal with it)

spred(...) is just a macro. It means that the usual prop notation such as actually desugars to its SPred variant such as SPred.and.
Note that using simp will "destroy" the SPred proof mode and should only be used as an endgame tactic.


I can see that even with those changes there is no good way to conclude your proof, which is unfortunate.
I think that is cause by relying too much on SPred [TheState] being defeq to TheState -> Prop.
You willl find that the latest nightly unfortunately breaks this assumption completely, because now SPred [TheState] is defeq to TheState -> ULift Prop.
I changed your test file to reflect that. Still I would expect that there is a manual proof using the m*-style tactics of the SPred proof mode; however my current attempts chokes on mintro not having reduced the application of SPred.and to the state s.
I will add your code as a test case, thanks!

Théophile (Jul 21 2025 at 13:25):

Cool, thank you for the answers!

None yet. Maybe you have something in mind? I think a use case would help; I feel like PostCond.partial is far less common than PostCond.total, and when you use the former you often want to specify a more refined failure condition, but I might be wrong.

I am interested in proving security properties of cryptographic protocols that operate in adversarial environment, so functions may fail for a variety of reasons (e.g. a message may fail to parse, a signature verification may say "invalid" and we don't proceed further, or a decryption may fail). Therefore in that area we would rely on PostCond.partial, because we cannot prove characterize meaningfully when a function succeed.

Sorry about the lack of documentation/reference manual content at the moment.

No worries, this is expected when using cutting-edge features :)

Sebastian Graf (Jul 21 2025 at 13:32):

Therefore in that area we would rely on PostCond.partial, because we cannot prove characterize meaningfully when a function succeed.

That makes sense. I suppose your language does not have any means to catch an "exception" then? Then yes, PostCond.partial is what you want.

Théophile (Jul 21 2025 at 14:00):

Yes, catching exceptions doesn't seem useful in that area (other than e.g. reporting error to the user, but that outside the scope of what we prove)

Markus de Medeiros (Jul 21 2025 at 18:48):

I finally looked closer at this project and it is super cool! Now that iris-lean is further along, I'm thinking a bit about how our projects can interoperate. I have a ton of questions that might help me understand it better (and a few pieces of feedback). If you have time to answer any of them I would appreciate it!

High-level:

  1. I know you mentioned above that documentation is not a priority right now, but are there any existing program logics you're modeling this after that I might be able to read more about? I've read a few loosely related papers about these logics based on predicate transformers but I'm still very green.
  2. What work needs to be done to extend the framework with a new monadic effect? I'd guess that you'd need at least a WP and WPMonad instance, but what does this workflow look like in general?
  3. Related, I see a bunch of lemmas around the development written for specific monads (for example, *_run, the MonadLift/MonadFunctor instances related to PredTrans) and I don't understand which are meant for "object-level" monads vs "specification-level" monads (those special ones you have PostShapes for).

Low-level questions:

  1. What is the HasFrame class (resp. Frame.frame proof rule) meant to be used for? Do you have any examples that use it?
  2. Why is PostShape an inductive type? Is there any way to specify postconditions for effects other than those that are state-like (with PostShape.arg) or error-like (with PostShape.except)?
  3. Why do PredTrans need to be conjunctive? I'm not surprised that they would be, I'm more just curious about where this constraint arises.
  4. If I understand correctly, the *.by_wp theorems are your adequacy theorems. Why only Id StateM and EStateM? Do you expect there to be something more general at play here?
  5. Related to the last one, I noticed that they all have PostCond.total postconditions. Is the logic only meant to have PostCond.total postconditions at the top level or do you think that other adequacy statements are provable?

Feedback:

  • The "Barrel" term is used a couple times, does this mean anything?
  • The term "partial" is overloaded, I think here you mean safe p -> postcondition p (right?) but people also use that to mean terminating p -> postcondition p.

Again thanks for anything you have time to answer :)

Sebastian Graf (Jul 22 2025 at 08:57):

I'm glad you like it :)

are there any existing program logics you're modeling this after that I might be able to read more about?

I... don't think there are. I used predicate transformers as the semantic foundation as that is how Hoare triples are commonly defined. I recalled traditional predicate transformers for an IMP-style language (there is plenty material on the internet, e.g., Wikipedia) and connected the dots for monadic programs. The initial challenge then was coming up with finding a good translation from arbitrary monads into predicate transformers. I played with Dijkstra monads (which translate an algebra of monads into a corresponding continuation monad) but that turned out to be cumbersome (search the thread). I simplified the approach into the WP and WPMonad type classes and was quite happy with that. Nice and boring :)

Perhaps there is a connection to Iris in that every StateT σ gives rise to an additional σ -> . parameter in the assertion, and hypotheses on these state parameters work like persistent propositions in Iris (I call these "stateful"). I.e., you may duplicate them but once you apply the rule of consequence (for c1 >> c2) they are "gone" (because the state changes and the hypothesis may no longer hold). This is in contrast to pure hypotheses which live in the regular Lean context.

What work needs to be done to extend the framework with a new monadic effect? I'd guess that you'd need at least a WP and WPMonad instance, but what does this workflow look like in general?

If your effect needs a new monad, then yes, you need to instantiate WP and WPMonad. Then you also need to prove and register a specification for you new effect (i.e., some monadic action), for which you need to unfold your definition of WP.wp. SpecLemmas.lean has them all for the pre-defined ones. This mechanism also works for loop invariants (e.g., forIn_list).

Related, I see a bunch of lemmas around the development written for specific monads (for example, *_run, the MonadLift/MonadFunctor instances related to PredTrans) and I don't understand which are meant for "object-level" monads vs "specification-level" monads (those special ones you have PostShapes for).

The only "specification-level" monad is PredTrans ps for some ps : PostShape. Users of the VC generator never interact with this monad (or even the fact that it is a monad) unless they are specifying their own effects.
I suppose you are referring to the *_apply lemmas in the PredTrans development? I just took a second look and deleted them. The approach I settled for no longer needs them and they are mostly confusing. Thanks for making me take that second look.
The reason I don't need them is that it's often much preferable to stay in the "syntactic"/definable world where we see wp[someEffect a b c] rather than in the "semantic" world somePredTrans wp[someEffectPart1] wp[someEffectPart2] ... due to reasons of full abstraction/definable elements. (This is important when handling scoped effects such as withReader.)

What is the HasFrame class?

It is used by the mframe tactic. It tries to lift as many stateful hypotheses into the pure Lean context as possible. That is, a smart mpure that infers the pure "frame" of hypotheses that can survive the next effect.
This tactic might be useful for iris-lean as well (although I find it unfortunately unlikely that we can share implementations).
Here's the docstring:

`mframe` infers which hypotheses from the stateful context can be moved into the pure context.
This is useful because pure hypotheses "survive" the next application of modus ponens
(`Std.Do.SPred.mp`) and transitivity (`Std.Do.SPred.entails.trans`).

It is used as part of the `mspec` tactic.

```lean
example (P Q : SPred σs) : ⊢ₛ p  Q  q  r  P  s  t  Q := by
  mintro _
  mframe
  /- `h : p ∧ q ∧ r ∧ s ∧ t` in the pure context -/
  mcases h with hP
  mexact h
```

Note that the stateful proof mode is entirely independent from the weakest precondition translation/Hoare triples.

Why is PostShape an inductive type? Is there any way to specify postconditions for effects other than those that are state-like (with PostShape.arg) or error-like (with PostShape.except)?

It is inductive precisely because I wanted a simple, closed definition! Do you have an effect in mind that cannot be modelled using this notion of postcondition? As I wrote in the OP (RFC (2)), I would be very keen to know about this use case.

Why do PredTrans need to be conjunctive? I'm not surprised that they would be, I'm more just curious about where this constraint arises.

One user wanted to separately prove a global spec {P1} x {Q1} and a more local spec {P2} x {Q2} to get {P1 /\ P2} x {Q1 /\ Q2} (cf. Triple.and) without repeating the proof for the global spec, and making this a native property of PredTrans was (sadly) the only way I could make that work.

If I understand correctly, the *.by_wp theorems are your adequacy theorems. Why only Id StateM and EStateM? Do you expect there to be something more general at play here?

Yeah, this is just an oversight. I should add the other theorems as well :) Thanks for the nudge.

Related to the last one, I noticed that they all have PostCond.total postconditions. Is the logic only meant to have PostCond.total postconditions at the top level or do you think that other adequacy statements are provable?

I just strengthened EStateM.by_wp to

theorem EStateM.by_wp {α} {x : EStateM.Result ε σ α} {prog : EStateM ε σ α} (h : EStateM.run prog s = x) (P : EStateM.Result ε σ α  Prop) :
  (⊢ₛ wpprog postfun a s' => P (EStateM.Result.ok a s'), fun e s' => P (EStateM.Result.error e s')⟩⟩ s)  P x

The "Barrel" term is used a couple times, does this mean anything?

Ah, this is PL semantics lingo relating to "double-barrelled continuations". Perhaps it is not as common a term that I should use it. "Barrel" in this context just refers to either a sucess or one of the failure "continuations"/assertions in a PostCond.

The term "partial" is overloaded, I think here you mean safe p -> postcondition p (right?) but people also use that to mean terminating p -> postcondition p.

Yeah, I actually don't like "partial" very much because it is often conflates divergence of a program with throwing an exception. The former is a meta property of the program but the latter is a recoverable object property of the program. But in the programs we consider, divergence does not play any role, so partiality refers to whether or not an exception has been thrown.
Then "partial correctness" indeed means safe p -> postcondition p (if safe p means "no exception in p) and is expressed via PostCond.partial.

Do you maybe have a suggestion for better names?

Markus de Medeiros (Jul 22 2025 at 14:26):

Thanks for the response, you answered most of my questions very decisively!

Perhaps there is a connection to Iris ...

Yeah there's probably something here. That said there are also a variety of ways you might want to relate the state in a state monad (and in fact all other parts of the program state) to Iris state. At a high level, I'm trying to understand which problems that Std.Do is good at solving, so the weaknesses of one project can be supported by the strengths of the other.

One idea which is present in quite a few Iris papers is that programs in complicated languages can have much simpler semantic models. Type systems are one example, Std.Do-style monadic programs are another. So if I can use Iris to prove that my horrifying deeply embedded program refines a simpler Std.Do model, your logic may be able to take over.

The only "specification-level" monad is...
I suppose you are referring to the *_apply lemmas in the PredTrans development?

Sure, I was confused about those, but (with apologies for the #XY) in general I was trying to figure out which lemmas meant to be applied to specifications, versus applied to objects. For example: you laid out the steps I would need to take to support code that uses a new monad. In that case do I also need to add a new *.by_wp theorem? Or are the *.by_wp theorems "specification-level" in the sense that they correspond to the cases of PostShape and therefore don't need to be extended to support new effects?

Don't worry about it if this question doesn't have a clear answer. I know the object/specification line is blurry in this world.

The reason I don't need them is that it's often much preferable to stay in the "syntactic"/definable world

By this do you mean "definable using the syntax of do notation" or something else? I kind of hope that the former isn't the case. In SampCert (where we had shallowly embedded monadic programs) we pretty frequently used terms that couldn't be written using do in intermediate stages of our proofs.

Do you have an effect in mind that cannot be modelled using this notion of postcondition?

None come to mind. It is definitely good to know where this decision came from. You don't need to convince an Iris user that State itself express a lot. What still confuses me though is why you need this type to be closed for extension... maybe I missed this in your code but do you have an example of where having this be closed is useful?

Ah, this is PL semantics lingo relating to "double-barrelled continuations".

Just a gap in my PL jargon then, no need to change it :)

Do you maybe have a suggestion for better names?

For partiality, I just think a comment is fine. The term is used inconsistently so trying to pick a universally understood one is a lost cause haha.

Sebastian Graf (Jul 23 2025 at 11:18):

relate the state in a state monad (and in fact all other parts of the program state) to Iris state.

So a stack of StateT σ1 (StateT σ2 (... StateT σn Id Nat)) leads to assertion p : σ1 -> ... -> σn -> Prop. In a Separation Logic you could encode such an assertion as l1 |-> s1 * l2 |-> s2 * ... * ln |-> sn * [p s1 s2 ... sn]. That is, every StateT corresponds to a "static" address li which is by birth alias-free with any of the other StateTs.
What SL can do and the StateT encoding can't is "dynamic" memory allocation; that is, implement ST.mkRef. The equivalent StateT operation would need to change the type and generally require some sort of polymorphic recursion/lenses in the object program. Of course you could allocate all your dynamic ref cells from a heap stored in one particular StateT layer, but then you will lack the aliasing guarantees that SL provides you (unless you implement an SL just for that single StateT layer.).

So: if you have an Iris proof for some program in ST (which I doubt is a well-formed statement given that Iris needs to deeply embed its object language and give it an OpSem) which does not itself allocate, you could probably translate your proof obligation into Std.Do and blast through the proof with mvcgen and grind.

In that case do I also need to add a new *.by_wp theorem? Or are the *.by_wp theorems "specification-level" in the sense that they correspond to the cases of PostShape and therefore don't need to be extended to support new effects?

Good question. I don't know :) You probably need to define such a theorem for every new WP instance. These lemmas are mostly useful if (1) you have some pure proof obligation such as isPrime n = true and (2) isPrime is internally implemented using M.run prog (for example M=Id). Then the by_wp lemma is convenient to expose the monadic prog and start the proof in terms of wp[prog] Q. Here's an example from the kitchen sink test file. Different Ms need different by_wp lemmas.

In SampCert (where we had shallowly embedded monadic programs) we pretty frequently used terms that couldn't be written using do in intermediate stages of our proofs.

I meant "definable in the computational monad m α". I suppose these terms in SampCert you are referring to are still definable as a computation in m α? By contrast, the semantic domain PredTrans of WP.wp is far larger than the image of WP.wp. Many properties can only be shown considering the image of WP.wp (the "definable elements" of PredTrans), a classic gotcha of denotational semantics. Hence it is best to keep WP.wp "as far out as possible"; pushing it inwards by unfolding loses information. But I suppose this is not super important for users of Std.Do :)

What still confuses me though is why you need this type to be closed for extension... maybe I missed this in your code but do you have an example of where having this be closed is useful?

One reason I like it closed is that I can define PredShape.args as a closed definition. Maybe that could live in a type class that an abstract PredShape would need to implement... And similarly we would need a method for FailConds and all its properties.
Doable, but it is unclear why you would want to do that when keeping it closed is the simpler and more direct solution :)

Markus de Medeiros (Jul 23 2025 at 13:35):

which I doubt is a well-formed statement given that Iris needs to deeply embed its object language and give it an OpSem

One of the ideas I'm exploring right now is that an iris-lean program logic could be used to relate a model of a programming language that does have an operational semantics (eg. Rust a la RustBelt or C a la RefinedC) to an ST program. In this case, StateT state doesn't have to correspond just to physical state, just the stateful part of your program that you're trying to model. In this setup, iris-lean becomes a great way to construct models of your program and Std.Do becomes a great way to reason about them. Could be interesting, but I'm still thinking about it!

I meant "definable in the computational monad m α".

Great, that's what I was hoping you meant. Thanks for the explanations :)

Bas Spitters (Aug 02 2025 at 18:02):

@Markus de Medeiros @Sebastian Graf The experimental Lean backend for the Hax tool for verifying Rust programs already uses Std.Do. E.g. https://github.com/cryspen/hax/pull/1590

Sebastian Graf (Aug 04 2025 at 12:52):

Nice! hax-lib/proof-libs/lean/Lib.lean looks what I had in mind to support Aeneas.

Clément Blaudeau (Aug 04 2025 at 13:08):

Thanks! Indeed we're very interested in your work @Sebastian Graf , as we're buildind a Lean backend (still very early) for Hax. So far, we are very happy with the design, and mvcgen works quite well for our use case.

I have a couple of questions :

  1. When defining hoare triples, is it possible to have the precondition in the context for the definition of the post-condition ? For instance, this is ill-formed :
@[spec]
theorem test_spec (α : Type) (a: Array α) (i: Nat):
   i < a.size 
  ( Result.ofOption a[i]? arrayOutOfBounds )
    r => r = a[i]  := by sorry

Indeed, the hypothesis i < a.size is not accessible in the post condition, making a[i] ill formed. It can be solved by turning i < a.size into an extra argument of the theorem, but it feels less natural than keeping it in the hoare triple

  1. We define Hoare triples for our arithmetic operations with overflows, which look like this (where +? is our custom add):
@[spec]
theorem HaxAdd_spec_bv (x y: isize) :
   ¬ (BitVec.saddOverflow x.toBitVec y.toBitVec) 
  (x +? y)
    r => r = x + y  := by mvcgen [instHaxAdd]

However, we also need rewriting rules (because +? can also appear in postconditions, once the job of mvcgen is finished), like:

theorem HaxAdd_spec_bv_rw (x y: isize) :
   ¬ (BitVec.saddOverflow x.toBitVec y.toBitVec) 
   x +? y = Result.ok (x + y)
:= by simp [instHaxAdd]

Is there a way to use the Hoare triples as rewriting rules : use HaxAdd_spec_bv instead of HaxAdd_spec_bv_rw in proofs ?

Clément Blaudeau (Aug 05 2025 at 10:57):

  1. Are there ways to define several attributes besides spec, so that mvcgen could be configured to use different sets of specifications ? Typically, we have bitvector based specifications and nat/int based specifications depending on the uses cases :
/-- Bitvec-based specification for rust multiplication on usize -/
@[spec_bv]
theorem HaxMul_spec_bv (x y: usize) :
   ¬ (BitVec.umulOverflow x.toBitVec y.toBitVec) 
  (x *? y)
    r => r = x * y  := by ...

/-- Nat-based specification for rust multiplication on usize -/
@[spec_nat]
theorem HaxMul_spec_nat (x y: usize) :
    x.toNat * y.toNat < USize.size  
  (x *? y)
    r => r = x * y  := by ...

It would be very useful to be able to do mvcgen with spec_bv or mvcgen with spec_nat and get the right preconditions

Sebastian Graf (Aug 05 2025 at 11:17):

So far, we are very happy with the design, and mvcgen works quite well for our use case.

Thanks, that's very nice to hear :)

It can be solved by turning i < a.size into an extra argument of the theorem, but it feels less natural than keeping it in the hoare triple

Moving it into the regular Lean context is exactly what I would do. It is desirable to move as many facts as possible into the pure Lean context so that they survive applications of the rule of consequence (i.e., SPred.entails.trans). The mframe tactic tries to help and automate such moves, and it is automatically applied by mspec (and mvcgen) to do these moves for you, but it doesn't hurt to define specs in a way that has "maximum information content".

Is there a way to use the Hoare triples as rewriting rules : use HaxAdd_spec_bv instead of HaxAdd_spec_bv_rw in proofs ?

Sorry, I don't completely understand what you want to do. Per the previous answer, I would write HasAdd_spec_bv as

@[spec]
theorem HaxAdd_spec_bv (x y: isize) (h : ¬ (BitVec.saddOverflow x.toBitVec y.toBitVec)) :
  True
  (x +? y)
    r => r = x + y  := by mvcgen [instHaxAdd]

Why would x +? y appear in postconditions? It's part of the program syntax so it should only occur in the postcondition in wp[.] or in Hoare triples {P} . {Q}? Unless you mean inside a wp application?

Are there ways to define several attributes besides spec?

Hmm. I don't have plans for this kind of feature... However, would it work for you to use scoped [attribute spec] and then open the respective namespace?

/-- Bitvec-based specification for rust multiplication on usize -/
theorem HaxMul_spec_bv (x y: usize) :
   ¬ (BitVec.umulOverflow x.toBitVec y.toBitVec) 
  (x *? y)
    r => r = x * y  := by ...

namespace BV

attribute [scoped spec] HaxMul_spec_bv

end BV

/-- Nat-based specification for rust multiplication on usize -/
@[spec_nat]
theorem HaxMul_spec_nat (x y: usize) :
    x.toNat * y.toNat < USize.size  
  (x *? y)
    r => r = x * y  := by ...

namespace Nat

attribute [scoped spec] HaxMul_spec_nat

end Nat

...
  open BV in
  mvcgen ...
...

Clément Blaudeau (Aug 07 2025 at 11:04):

Thank you for the answers!

  • for 1. Indeed, moving every pure fact out of the precondition made mvcgen happier, and solves the issue
  • For 3. scoped attributes do the trick, allowing for a nice open SpecBV in mvcgen or open SpecNat in mvcgen

Regarding 2. As we export pre and post-conditions from the rust code, we can have hoare triples that look like this (where the barrett function computes the modulus of 3329) :

def __ensures (value : i32) (result : i32) := do
  pure (( value %? 3329) = result)

theorem barrett_spec (value: i32) :
    value   0x40000 
  (barrett_reduce value)
    result =>  __ensures value result = pure true  := by
 ...

Indeed, the rust written post-condition __ensures will also mention panicking arithmetic %?. Therefore, once all the panic freedom side conditions have be proved, and there is only the post-condition remaining, I would like to do rewrites on %? that are basically the same as the hoare triple.

Sebastian Graf (Aug 07 2025 at 12:11):

Ah, sounds like you actually want a relational/binary hoare logic to prove that barret_reduce value is bisimilar to value %? 3329. That's tricky and not currently a use case that is actively supported by Std.Do.

In your particular example you can first run mvcgen on barret_reduce value and then will have some proof obligation in which __ensures value result occurs. You would run mvcgen on that once more.
That said, this approach is not particularly compositional and is likely unpleasant to work with.

Regarding your original question (2), I imagine you can define a lemma that derives HaxAdd_spec_bv_rw from HaxAdd_spec_bv. This could be helpful (this is assuming nightly; leave out .down on the RC):

theorem Result.by_wp {α} {x : Result α} (P : Result α  Prop) :
  (⊢ₛ wpx postfun a => P (.ok a), fun e => P (.fail e)⌝⟩)  P x := by
    intro hspec
    simp only [instWP] at hspec
    split at hspec <;> simp_all [PredTrans.const]

theorem Result.to_rw {α} {x : Result α} {a : α} :
  (⊢ₛ wpx ( r => r = a))  x = .ok a := by
    intro hspec
    apply Result.by_wp (P := (· = .ok a))
    simp at *
    exact hspec

theorem Result.to_rw' {α} {x : Result α} {a : α} :
  (⦃P x  r => r = a⦄)  P.down  x = .ok a := by
    intro hspec hp
    apply Result.by_wp (P := (· = .ok a))
    simp at *
    exact hspec hp

Clément Blaudeau (Aug 07 2025 at 14:57):

Thank your for the rewriting lemmas!

In this specific example (barrett reduction), I might want a proof of bisimilarity, but in a lot of cases we're only interested in panic-freedom that seems to fit in the Std.Do use-cases. Interestingly, the different (while, I believe, equivalent) ways of writing the same specification are handled differently by mvcgen. Considering that __requires and __ensures return a Result Bool, we have :

theorem barrett_spec1 (value: i32) :
   __requires value = pure true 
  (barrett_reduce value)
    result => __ensures value result = pure true 
:= by sorry

theorem barrett_spec2 (value: i32) :
   __requires value = pure true 
  (barrett_reduce value)
    result => wp⟦__ensures value result ( r => r ) 
:= by sorry

theorem barrett_spec3 (value: i32) :
  wp⟦__requires (value)  post
    fun r => r = true ⊢ₛ
      wpbarrett_reduce value
        ( result =>
        wp⟦__ensures value result ( r => r )),
    fun _ => True  -- if the precondition panics, nothing to prove
:= by sorry

theorem barrett_spec4 (value: i32) :
  wp⟦__requires value post
  fun r => r = true ⊢ₛ
    wp(do let result  barrett_reduce value ;
           __ensures value result) (  r => r ),
  fun _ => True  -- if the precondition panics, nothing to prove
:= by sorry

The first one 1 (that I originally wrote) requires rewriting lemmas to prove __ensures .. = pure true, and tenious splitting on the hypothesis __requires .. = pure true. The 2 is much better for mvcgen, as it also steps through __ensures, but still requires manual splitting on the hypothesis __requires .. = pure true. The third one 3 can be fully handled by mvcgen, although I need to run it twice (first on the precondition, then use apply SPred.pure_intro ; intros, then re-run it). The fourth one 4 is equivalent to the third one I think, but maybe more readable ? (debatable). Do you have an opinion on the style of specifications and their potential limitations (from a proof point of view) ?

Thank you for your help!

Jakob von Raumer (Aug 07 2025 at 23:27):

Is it expected that simp_all doesn't simplify away the wp occurrences in this example?

def matchFn (a b : Bool) : Id (Nat × Nat) := do
  let x : Nat := match a with
    | .true => 42
    | .false => 23
  let y : Nat := match b with
    | .false => 4
    | .true => 8
  (x, y)

theorem matchFn_spec {a b} :
    True
    matchFn a b
     postfun (x, y) => y % 2 = 0⌝⟩  := by
  mvcgen [matchFn]
  split <;> split <;> simp_all
  sorry

Jakob von Raumer (Aug 07 2025 at 23:33):

(Sorry if this is more or less the same issue that @Clément Blaudeau was asking about :thinking: )

Jakob von Raumer (Aug 08 2025 at 00:41):

Generally, I also struggle to find an easy way to go between an simple implication and the Hoare triple for total functions, which is relevant at points where monadic code and pure code touch, e.g. by Id.run inside a pure function.

Sebastian Graf (Aug 08 2025 at 07:29):

Do you have an opinion on the style of specifications and their potential limitations (from a proof point of view) ?

I feel like I'm lacking context here, but I think I would first try and convert the pre and postconditions from a "program" involving Result Bool into something without it by running a metaprogram that does mvcgen/simp_all on

⊢ₛ wp⟦__requires (value) ( r => r = true)

This should yield the desired Prop that you can put in the precondition.

Alternatively, you could provide a tactic that your users should run before the actual mvcgen invokation in barret_spec that rewrites the pre and postconditions using (the inverse of) Result.to_rw and mvcgen.

Sebastian Graf (Aug 08 2025 at 07:31):

Is it expected that simp_all doesn't simplify away the wp occurrences in this example?

It kind of is because there is no monadic computation that it could apply the spec to. If you insert the pure (x, y) that is implicit by defeq, you'll get

a b : Bool
x : Nat := match a with
| true => 42
| false => 23
y : Nat := match b with
| false => 4
| true => 8
 y % 2 = 0

and can proceed however you want.

Sebastian Graf (Aug 08 2025 at 07:35):

Generally, I also struggle to find an easy way to go between an simple implication and the Hoare triple for total functions, which is relevant at points where monadic code and pure code touch, e.g. by Id.run inside a pure function.

Does Id.by_wp help?

theorem Id.by_wp {α : Type u} {x : α} {prog : Id α} (h : Id.run prog = x) (P : α  Prop) :
  (⊢ₛ wpprog (PostCond.total (fun a => P a)))  P x := h  (· True.intro)

You use it with generalize like in Bhavik's example (and a few others) to "focus" on/punch a hole into the part of the target P (i.e. a function application) that is defined by Id.run and then you can use mvcgen on its do block to show the target P.

Sebastian Graf (Aug 08 2025 at 12:03):

While adding some grind lemmas to eliminate the idiosyncratic Std.Range.toList range = xs ++ y :: ys assumptions that arise through the current encoding of loop invariants using the new Std.List.Zipper, I got carried away trying out a different encoding for loop invariants.

In lean4#9766, I switched to using a Fin (xs.length + 1) to encode the iteration state of the loop. All in all I like that, because it makes specification of loop invariants simpler (below, xs used to be a Zipper; now it's just the list the loop iterates over):

-  case inv => exact ( (r, xs) => ( x, x  xs.suff  x  5)  r + xs.suff.length * 5  25)
+  case inv => exact ( (r, xs, i) => r + (xs.drop i).length * 5  25)

Unfortunately, some of the proofs get more complicated due to type dependencies; a hyp h : i < [1:5].toList.length in the local context will not be simplified to h : i < 4 if h is used in an array bounds check.

Does anyone have an opinion here? I tend to prefer the Fin-based encoding because it means one fewer concept (zippers) to learn for users.

Sebastian Graf (Aug 12 2025 at 17:36):

I rejected the Fin (xs.length + 1) idea and parked it at https://github.com/leanprover/lean4/commits/sg/parked-fin-specs/.

Sebastian Graf (Aug 12 2025 at 17:36):

Here's where I'm drafting a reference manual entry/tutorial: https://hackmd.io/@sg-fro/BJRlurP_xg

Bhavik Mehta (Aug 12 2025 at 17:40):

Earlier in the thread you wrote

Moving it into the regular Lean context is exactly what I would do. It is desirable to move as many facts as possible into the pure Lean context so that they survive applications of the rule of consequence (i.e., SPred.entails.trans).

and I think something along these lines would be a welcome addition to the entry, since I had otherwise assumed the exact opposite would be true

Alok Singh (Aug 13 2025 at 07:44):

Wrapping functions to be verified in Id since it requires a monad is kinda ugly (but makes sense)

Bhavik Mehta (Aug 13 2025 at 12:44):

That's not specific to this framework, that pattern was around for perhaps a couple of years before this!

Alok Singh (Aug 13 2025 at 21:19):

TIL

Sebastian Graf (Aug 14 2025 at 16:47):

Bhavik Mehta said:

and I think something along these lines would be a welcome addition to the entry, since I had otherwise assumed the exact opposite would be true

added a digressing subsection. Not good writing style but should do the job.

The linked tutorial should be "complete enough" now. As always, I'm open for feedback! Though I'll be away for a month now, so I can't really incorporate feedback until after that.

Frédéric Dupuis (Aug 17 2025 at 01:15):

Is there a trick required to make mvcgen with loops over Std.PRange?

Here's an MWE that has me puzzled:

import Std.Tactic.Do

open Std.Do

def foo (l : List Nat) : Nat := Id.run do
  let mut out := 0
  for _ in l do
    out := out + 1
  return out

def bar (n : Nat) : Nat := Id.run do
  let mut out := 0
  for _ in 0...n do
    out := out + 1
  return out

/-- This works as expected: -/
theorem foo_eq (l : List Nat) : foo l = l.length := by
  generalize h : foo l = x
  apply Id.of_wp_run_eq h
  mvcgen
  case inv1 =>
    exact ⇓⟨xs, out => True
  case vc1 => sorry
  case vc2 => sorry
  case vc3 => sorry

/-- This fails in `inv1`: -/
theorem bar_eq (n : Nat) : bar n = n := by
  generalize h : bar n = x
  apply Id.of_wp_run_eq h
  mvcgen
  case inv1 =>
    -- Invalid match expression: The type of pattern variable 'xs' contains metavariables:
    -- (0...n).toList.Cursor
    exact ⇓⟨xs, out => True
  case vc1 => sorry
  case vc2 => sorry
  case vc3 => sorry
  case vc4 => sorry

Sebastian Graf (Aug 17 2025 at 13:48):

It seems that it fails to synthesize the instance for Std.PRange.HasFiniteRanges Std.PRange.BoundShape.open Nat which is one of the requirements of the spec lemma for PRanges. Not sure why that is; #synth Std.PRange.HasFiniteRanges Std.PRange.BoundShape.open Nat succeeds. Need to investigate.

Edit: You can work around it by inferring the instance vc1 before specifying the invariant inv1:

theorem bar_eq (n : Nat) : bar n = n := by
  generalize h : bar n = x
  apply Id.of_wp_run_eq h
  mvcgen
  case vc1 => exact inferInstance
  case inv1 =>
    exact ⇓⟨xs, out => True
  case vc2 => sorry
  case vc3 => sorry
  case vc4 => sorry

Dexin Zhang (Oct 04 2025 at 19:49):

Is there plan to support loop or while loop in this framework? This might be tricky, since loop is now a partial def, and even partial_fixpoint is said not to be planned.

Robin Arnez (Oct 05 2025 at 16:42):

Supporting repeat is tricky because as it is you can't do anything with it (as you said because of partial). If we want to change that we have to face a problem: We want it to still work for all monads (because it did before) but we also want to guarantee that it actually does something reasonable on infinite loop (and not always something completely random through Classical.choice). For example, an infinite loop in Option should be none. But then there are monads like ExceptT Bool Option. Without any more information than the monad instance, you can't distinguish between none, throw false and throw true but we want to ensure an infinite loop is none. Thus, we need more information, e.g. an additional type class. But then repeat would behave differently before and after defining an instance of such type class, where it would first return any of the three exceptions on infinite loop and after defining an instance return none in particular. Thus programs written before vs. after that point behave differently in logic, which could be confusing.

(deleted) (Oct 05 2025 at 18:15):

Yeah let's not get into termination hell. I think we can do a lot with constrained looping primitives (recursors).

(deleted) (Oct 05 2025 at 18:16):

If you want while, or arbitrary recursion, it's possible to use an inductive predicate

Joachim Breitner (Oct 05 2025 at 18:35):

Would it be so bad if the loop and while keywords detect when they have a monad with suitable structure and have different, partial_fixpoint-like logical content that allows partial correctness proofs using mvcgen? All existing users likely only care about the compiled behavior, and that wouldn't change, would it?

(deleted) (Oct 05 2025 at 18:46):

No it's a can of worms

(deleted) (Oct 05 2025 at 18:47):

Let's say there's a program that prints hello world forever

(deleted) (Oct 05 2025 at 18:47):

How is the program represented

(deleted) (Oct 05 2025 at 18:47):

Things are nice and dandy if there are no side effects

Joachim Breitner (Oct 05 2025 at 19:26):

Already now, with partial, the logical content of such a program is unspecified. Are you saying users currently rely on that somehow?

Robin Arnez (Oct 05 2025 at 19:26):

My idea would've been:

class MonadLoop (m : Type u  Type v) [Monad m] where
  loop {α : Type u} : (α  m (ForInStep α))  α  m α
  loop_of_exists {α : Type u} :  f,
    ( g,  a : α, g a = f a >>= (·.casesOn pure g)) 
     a : α, loop f a = f a >>= (·.casesOn pure (loop f))

noncomputable instance (priority := low) MonadLoop.default (m : Type u  Type v) [Monad m] :
    MonadLoop m where
  loop _ := Classical.epsilon _
  loop_of_exists _ h := Classical.epsilon_spec h

def MonadLoop.fixByOrder [Monad m] [ α, Lean.Order.CCPO (m α)] [Lean.Order.MonoBind m]
    (f : α  m (ForInStep α)) (a : α) : m α :=
  f a >>= fun | .done x => pure x | .yield x => fixByOrder f x
partial_fixpoint

instance [Monad m] [ α, Lean.Order.CCPO (m α)] [Lean.Order.MonoBind m] :
    MonadLoop m where
  loop := MonadLoop.fixByOrder
  loop_of_exists f h a := by
    rw [MonadLoop.fixByOrder]
    congr; funext
    split <;> rfl

So yeah, basically partial_fixpoint

Robin Arnez (Oct 05 2025 at 19:27):

Huỳnh Trần Khanh schrieb:

How is the program represented

Do we care? There are already plans to make IO opaque, so might as well also make that aspect of it opaque with suitable specification.

Dexin Zhang (Oct 05 2025 at 22:34):

Robin Arnez said:

Supporting repeat is tricky because as it is you can't do anything with it (as you said because of partial). If we want to change that we have to face a problem: We want it to still work for all monads (because it did before) but we also want to guarantee that it actually does something reasonable on infinite loop (and not always something completely random through Classical.choice). For example, an infinite loop in Option should be none. But then there are monads like ExceptT Bool Option. Without any more information than the monad instance, you can't distinguish between none, throw false and throw true but we want to ensure an infinite loop is none. Thus, we need more information, e.g. an additional type class. But then repeat would behave differently before and after defining an instance of such type class, where it would first return any of the three exceptions on infinite loop and after defining an instance return none in particular. Thus programs written before vs. after that point behave differently in logic, which could be confusing.

Thanks to point it out. This is indeed tricky, and even partial_fixpoint uses none or Classical.choice as the value of a non-terminated function.

Dexin Zhang (Oct 05 2025 at 22:57):

I'm thinking that to prove a Hoare triple of a loop, one can require a termination measure to be specified (which is typical in e.g. Dafny). This will work with partial_fixpoint. This is a total Hoare triple, and I don't know if it is possible to even define a partial Hoare triple when we work with any monad.

Sebastian Graf (Oct 06 2025 at 11:00):

Is there plan to support loop or while loop in this framework?

No, no plans yet. I think we'd first need to support generating definition bodies for partial defs in the first place.

I'm thinking that to prove a Hoare triple of a loop, one can require a termination measure to be specified (which is typical in e.g. Dafny).

Indeed this would be nice, and I think it's what Joachim suggests above. But I don't think it's possible without producing definition bodies taking some sort of termination proof as argument. See #lean4 > Extrinsic termination proofs for well-founded recursion.

Bas Spitters (Oct 06 2025 at 12:10):

Dijkstra monads forever: termination-sensitive specifications for interaction trees
https://dl.acm.org/doi/10.1145/3434307

Combines Dijkstra monad with ITrees for non-termination. @Sebastian Graf Have you compared this with your implementation?

Dexin Zhang (Oct 06 2025 at 14:50):

Sebastian Graf said:

Indeed this would be nice, and I think it's what Joachim suggests above. But I don't think it's possible without producing definition bodies taking some sort of termination proof as argument. See #lean4 > Extrinsic termination proofs for well-founded recursion.

That discussion looks nice! Though, instead of finding a general solution of partial functions, we can alternatively customize the instance of ForIn Loop.

Sebastian Graf (Oct 07 2025 at 07:35):

Combines Dijkstra monad with ITrees for non-termination. @Sebastian Graf Have you compared this with your implementation?

Yes. They use ITress as a semantic domain for a denotational semantics of a first-order imperative language IMP. We need to define an expansion of while from Lean's native do notation. It would be very confusing and not at all performant to involve an ITree-based monad in that expansion (and I have no idea how something like that could solve our problem). This has nothing to do with Dijkstra monads/Std.Do; it's a fundamental limitation of how Lean's while is implemented in terms of the ForIn Loop instance.

Though, instead of finding a general solution of partial functions, we can alternatively customize the instance of ForIn Loop.

Just yesterday I brainstormed how to implement a while ... decreasing <measure mentioning let muts> do ... primitive that takes a different expansion path than while ... do ... that would allow for reasoning about such loops. Coming up with the right kind of well-founded relation/accessibility predicate is challenging.
It might be promising to build on the Iterator framework, because its Finite predicate solves exactly this problem.


Last updated: Dec 20 2025 at 21:32 UTC