Zulip Chat Archive

Stream: Program verification

Topic: Monadic program logic: Request for feedback

Sebastian Graf (Feb 14 2025 at 11:31):

Fellow Lean users,

As announced in the Lean FRO meeting at Lean Together 2025, I'm working on behalf of the Lean FRO on a Floyd/Hoare-style program logic framework to verify "arbitrary" monadic code written in Lean.
This thread is meant to collect requirements from you!

Scope: "Is this work relevant to my use case?"

TLDR; this work could be relevant to you if you want to verify code written using Lean's do-notation.

Let me clarify what "program logic for monadic code" means by giving a sketch of the definition of Hoare Triple I have in mind:

def M α := ...
instance : Monad M where ...
def triple (x : M α) (pre : $states → Prop) (post : α → $states → Prop) : Prop := ...

(Find here the exact definition I'm currently working with, but the above is close enough to give you an idea.)
The intention is that any user may provide a definition of M and triple (plus structural rules) and then verify programs in M building on an extensive program logic with good automation and UX.

In addition to scalable extrinsic proofs, the vision is to allow convenient intrinsic F*/Dafny/Verus-style proofs embedded into do blocks by elaboration into the extrinsic framework.

Note that although M α is "semantics of a program in Lean", you should be able to use this framework for any language with a denotational semantics into a monad.

This should include any strongly normalizing object language such as System F, but exclude untyped lambda calculus, System F with recursive types or mutable references (finding monadic semantic domains for the latter is an active research problem).

Request for feedback

I am interested in gathering any kind of feedback (please keep focus on semantics rather than syntax that is likely to change :smile:), or people I should reach out to that are not yet in the following list:

Jeremy Avigad, Mario Carneiro, Wojciech Nawrocki, John Tristan, Jonathan Protzenko

However, I'm specifically interested in feedback to the following items which I encourage you refer to in your answer:

What is your use case? Can you point me to reasonably self-contained example code?
We need to figure out what "arbitrary" monadic code means. For example, my prototype would currently support any kind of transformer stack that can be represented as StateT/ExceptT (plus ReaderT/WriterT and equivalents through StateT) over Id/IO. (Behavior of runtime IO functions will be axiomatised.) Is this sufficient for your use case? I'm asking because this plan does not currently include a Separation Logic to reason about IO.Ref or ST.Ref, and I'm yet unsure how much of a breaking change it would be to add this capability after the fact.
The logic will include tactics on par with established Rocq frameworks such as CFML (of Separation Logic Foundations fame) or Iris. Good integration with the upcoming grind tactic is a given. Any other frameworks I should take a look at?
My focus so far is on unary program logic. I have heard at least one voice who want to work in a relational program logic (relating multiple program runs with different preconditions) for cryptographic proofs à la SSProve. Does your use case require a relational program logic?
Any Related Work I should look at? I am aware of F*, Dafny, Verus, the SatisfiesM experiment from Batteries, the Rocq frameworks above and the literature on Dijkstra monads.

Thanks!

Mario Carneiro (Feb 14 2025 at 12:57):

Sebastian Graf said:

The logic will include tactics on par with established Rocq frameworks such as CFML (of Separation Logic Foundations fame) or Iris. Good integration with the upcoming grind tactic is a given. Any other frameworks I should take a look at?

I assume you are aware of https://github.com/leanprover-community/iris-lean ? It is starting to gain momentum with some of the people working on Iris, so while it's currently not really ready for use it may be in a better state once your work is getting closer to completion. Ideally these two frameworks will be able to work together seamlessly, and we should coordinate as needed to make sure this is the case

Sebastian Graf (Feb 14 2025 at 15:17):

I would love to support that effort in some form. I could imagine the following definition for triple in terms of HeapProp (could probably be generalised to any BI):

 def triple (x : M α) (P : HeapProp VAL) (Q : α → HeapProp VAL) :=
    □(P -⋆ (OBSERVE x) Q)

This is the interpretation suggested on p141 in the Iris lecture notes.
But what is M? What is OBSERVE? What is VAL?

Provided that a user can answer these questions for their domain, I have no doubts that they will be able to use the framework I have prototyped.

I am however doubtful whether this will be useful to verify code using IO.Refs.
For example, even if we take M := IO and postulate OBSERVE as well as the structural rules of the logic, it is still quite difficult to find a VAL that lives in Type 0.
I have played around with using VAL := Σ α, α (because we want to allocate IO.Refs for any kind of α : Type), but that lives in Type 1. A potential shortcoming of my prototype is that this universe bump requires several ULifts in pure instantiations where we do not even need separation logic. Frankly, this is unacceptable.

I tried the alternative of indexing all computations over a heap context Γ : Nat -> Type but stopped when I realized that I would need to sacrifice the functional dependency m -> stack in doing so, because Γ occurs in the instantiation of stack. That turned out to be a no-go as well, because it greatly regresses type inference for all other instances.

Long story short: I would be more than happy to anticipate support for future Separation Logics, however that support must not come at a loss of convenience for other instances of the framework.

Shreyas Srinivas (Feb 14 2025 at 15:33):

Sebastian Graf said:

Fellow Lean users,

As announced in the Lean FRO meeting at Lean Together 2025, I'm working on behalf of the Lean FRO on a Floyd/Hoare-style program logic framework to verify "arbitrary" monadic code written in Lean.
This thread is meant to collect requirements from you!

Scope: "Is this work relevant to my use case?"

TLDR; this work could be relevant to you if you want to verify code written using Lean's do-notation.

Let me clarify what "program logic for monadic code" means by giving a sketch of the definition of Hoare Triple I have in mind:
def M α := ...
instance : Monad M where ...
def triple (x : M α) (pre : $states → Prop) (post : α → $states → Prop) : Prop := ...
(Find here the exact definition I'm currently working with, but the above is close enough to give you an idea.)
The intention is that any user may provide a definition of M and triple (plus structural rules) and then verify programs in M building on an extensive program logic with good automation and UX.

In addition to scalable extrinsic proofs, the vision is to allow convenient intrinsic F*/Dafny/Verus-style proofs embedded into do blocks by elaboration into the extrinsic framework.

Note that although M α is "semantics of a program in Lean", you should be able to use this framework for any language with a denotational semantics into a monad.

This should include any strongly normalizing object language such as System F, but exclude untyped lambda calculus, System F with recursive types or mutable references (finding monadic semantic domains for the latter is an active research problem).

Request for feedback

I am interested in gathering any kind of feedback (please keep focus on semantics rather than syntax that is likely to change :smile:), or people I should reach out to that are not yet in the following list:

Jeremy Avigad, Mario Carneiro, Wojciech Nawrocki, John Tristan, Jonathan Protzenko

However, I'm specifically interested in feedback to the following items which I encourage you refer to in your answer:

What is your use case?

We need to figure out what "arbitrary" monadic code means. For example, my prototype would currently support any kind of transformer stack that can be represented as StateT/ExceptT (plus ReaderT/WriterT and equivalents through StateT) over Id/IO. (Behavior of runtime IO functions will be axiomatised.) Is this sufficient for your use case? I'm asking because this plan does not currently include a Separation Logic to reason about IO.Ref or ST.Ref, and I'm yet unsure how much of a breaking change it would be to add this capability after the fact.

The logic will include tactics on par with established Rocq frameworks such as CFML (of Separation Logic Foundations fame) or Iris. Good integration with the upcoming grind tactic is a given. Any other frameworks I should take a look at?

My focus so far is on unary program logic. I have heard at least one voice who want to work in a relational program logic (relating multiple program runs with different preconditions) for cryptographic proofs à la SSProve. Does your use case require a relational program logic?

Any Related Work I should look at? I am aware of F*, Dafny, Verus, the SatisfiesM experiment from Batteries, the Rocq frameworks above and the literature on Dijkstra monads.

Thanks!

I only know Iris and have a passing experience with Satisfies and Dafny. The thing that annoyed me in Satisfies is how difficult it is to extract the guarantees from inside the satisfies monad and relate it to a specific input variable in a proof for a function outside the monadic context. I also have a question, which is that we currently use the Id monad for proofs. Is it a good idea to add support for a monad with a heap structure for use with separation logics?

Bas Spitters (Feb 14 2025 at 15:48):

@Sebastian Graf Are you aware of the Dijkstra monad approach. I know there are other groups trying to implement it in Lean.
https://arxiv.org/abs/1903.01237

We've used it successfully for probabilistic relational Hoare logic in Coq: E.g. https://github.com/SSProve/ssprove

Sebastian Graf (Feb 14 2025 at 16:57):

Shreyas Srinivas said:

I only know Iris and have a passing experience with Satisfies and Dafny. The thing that annoyed me in Satisfies is how difficult it is to extract the guarantees from inside the satisfies monad and relate it to a specific input variable in a proof for a function outside the monadic context.

What is appealing about definitions such as SatisfiesMis their applicability across different monads. Unfortunately we were unable to prove basic properties such as conjunction (SatisfiesM x p -> SatisfiesM x q -> SatisfiesM x (p \and q)) with it. Then we tinkered with (the weaker)

def SatisfiesM {m : Type u → Type v} {α} [Monad m] (x : m α) : (α → Prop) → Prop :=
  fun p => ∀ {β} {f g : α → m β}, (∀ a, p a → f a = g a) → x >>= f = x >>= g

but had trouble defining infinite conjunction (i.e., the logic's forall) in terms of it. That was when I gave up on this approach and had a look at encodings of triple based on effect observations/Dijkstra monads (see my next post).

Shreyas Srinivas said:

I also have a question, which is that we currently use the Id monad for proofs. Is it a good idea to add support for a monad with a heap structure for use with separation logics?

I'm not completely sure I understand the question. If all programs that you want to prove correct can be expressed in Id (or any number of StateT on top of that), I don't think you need separation logic. You need separation logic only if your programs have dynamic memory allocation, because you need a way to say, e.g., "the ST.Ref a returned by ST.mkRef is distinct from any ST.Ref b returned by ST.mkRef in the future" and to then preserve any property of a after b has been allocated. Without dynamic memory allocation, you can express this property statically in the type system as a stack of StateTs; the frame rule is an appropriate liftM.

Sebastian Graf (Feb 14 2025 at 17:00):

Bas Spitters said:

Sebastian Graf Are you aware of the Dijkstra monad approach. I know there are other groups trying to implement it in Lean.
https://arxiv.org/abs/1903.01237

We've used it successfully for probabilistic relational Hoare logic in Coq: E.g. https://github.com/SSProve/ssprove

Yes, I am. In fact, until this week I had based my prototype framework around effect Observations defined by

class Observation (m : Type u → Type v) (w : outParam (Type u → Type x)) [Monad m] [∀{α}, Preorder (w α)] extends MonadOrdered w where
  observe : m α → w α
  pure_pure : observe (Pure.pure a) = Pure.pure a
  bind_bind (x : m α) (f : α → m β) : observe (x >>= f) = observe x >>= (fun a => observe (f a))

You could define def triple x P Q := P ≤ x.observe Q in terms of this type class, and that would work well for most instances (except separation logic, where you want persistent separating implication as alluded to in my previous post instead of entailment).

Again, what is important for the planned framework is that it is maximally convenient for the "StateT/ExceptT on top of Id" use case. Observation/Dijkstra monads allow for a polymorphic monad stack, which is nice.
However, when I tried to actually prove stuff about monad-polymorphic functions such as forIn or foldlM, I noticed that it is incredibly difficult to do so. Here is an example for forIn, allowing the user to supply a loop invariant. (I have successfully applied this lemma in my experiments, so it is useful.) Mind my difficulty expressing pre and post conditions such as hpre and hstepin this framework:

theorem Observation.forIn_list {α β} {m : Type u → Type v} {w : Type u → Type x}
  [Monad m] [∀{α}, Preorder (w α)] [obs : Observation m w]
  {xs : List α} {init : β} {f : α → β → m (ForInStep β)}
  (inv : List α → w β)
  (hpre : pure init ≤ inv xs)
  (hstep : ∀ hd tl,
      (inv (hd :: tl) >>= fun b => obs.observe (f hd b)) ≤ .yield <$> inv tl
    ∨ (inv (hd :: tl) >>= fun b => obs.observe (f hd b)) ≤ .done  <$> inv []) :
    obs.observe (forIn xs init f) ≤ inv [] := by
    open Observation MonadOrdered in
    -- Intuition: inv encapsulates the effects of looping over a prefix of xs (and gets passed the suffix)
    -- The induction hypothesis is:
    let prog suffix := inv suffix >>= fun b => obs.observe (forIn suffix b f)
    suffices hind : prog xs ≤ prog [] by
      -- This is because the predicate transformer is stronger (≤) than the invariant,
      -- and the longer the suffix passed to `prog`, the more we rely on predicate transformer.
      -- Conversely, the shorter the suffix, the more we rely on the invariant summarizing the effects of looping over a prefix.
      have : LawfulMonad w := inferInstance
      calc obs.observe (forIn xs init f)
        _ = pure init >>= fun b => obs.observe (forIn xs b f) := by simp only [pure_bind] -- only [List.forIn_nil, obs.pure_pure, bind_pure]
        -- For the after case (xs=[]), we have a lower bound because `forIn [] b` reduces to `pure b`
        _ ≤ inv xs >>= fun b => obs.observe (forIn xs b f) := bind_mono hpre (by rfl)
        -- For the initial case (full xs), we have an upper bound via hpre
        _ ≤ inv [] >>= fun b => obs.observe (forIn [] b f) := hind
        _ = inv [] := by simp
    -- Now prove hind : prog xs ≤ prog [] by induction
    clear hpre -- not needed any longer and would need to be generalized
    induction xs
    case nil => simp_all only [List.nil_append, le_refl]
    case cons hd tl h =>
      simp only [prog, List.forIn_cons, Observation.bind_bind, ←bind_assoc]
      cases hstep hd tl
      case inl hstep =>
        apply LE.le.trans _ h
        simp only [prog, List.forIn_cons, Observation.bind_bind, ←bind_assoc]
        apply LE.le.trans (MonadOrdered.bind_mono hstep (by rfl))
        simp only [bind_map_left, le_refl]
      case inr hstep =>
        simp only [prog, List.forIn_cons, List.forIn_nil, Observation.bind_bind, ←bind_assoc]
        apply LE.le.trans (MonadOrdered.bind_mono hstep (by rfl))
        simp only [bind_map_left, le_refl, prog]

Note

pure init ≤ inv xs expresses that init satisfies the precondition inv xs. Although I came up with this after long and hard thinking, I have no real intutition for what inv xs means. Its domain w β is huge, potentially encompassing all kinds of predicate transformers.
In hstep, I want to express that the result of the loop iteration f hd b satisfies inv [] b' when it is .done b', and inv xs b' when it is .yield b'. This is incredibly difficult to state and requires a disjunction. The disjunction means hstep is going to be hard to automate for the user of the lemma.
While I could have required ∀{α}, CompleteLattice (w α) instead and formulated hstep as (inv (hd :: tl) >>= fun b => obs.observe (f hd b)) ≤ (.yield <$> inv tl ⊔ .done <$> inv []) instead, this would not helped much with the automation problem.

So I found myself wondering whether we could find a way to expose the means to formulate proper pre and postconditions. In doing so I came up with the framework I propose and realized that we do not need Observation at all. Here's the equivalent lemma in the framework I propose:

theorem Triple.forIn_list {α β} {m : Type → Type}
  [Monad m] [LawfulMonad m] [MonadTriple m stack] [LawfulMonadTriple m stack]
  {xs : List α} {init : β} {f : α → β → m (ForInStep β)}
  (inv : List α → β → PreCond stack Prop)
  (hstep : ∀ hd tl b,
      ⦃inv (hd :: tl) b⦄
      (f hd b)
      ⦃⇓r | match r with | .yield b' => inv tl b' | .done b' => inv [] b'⦄) :
  ⦃inv xs init⦄ (forIn xs init f) ⦃⇓b' | inv [] b'⦄ := by
    induction xs generalizing init
    case nil => apply LawfulMonadTriple.triple_pure; simp
    case cons hd tl ih =>
      simp only [List.forIn_cons]
      apply LawfulMonadTriple.triple_bind
      case hx => exact hstep hd tl init
      case herror => simp
      case hf =>
        intro b
        split
        · apply LawfulMonadTriple.triple_pure; simp
        · exact ih

It is so much simpler. Maybe I'm doing something wrong or missing a trick in my encoding of effect observations. If that is indeed the case, I would like to see how to improve on Observation.forIn_list.

𝚠𝚘𝚓𝚌𝚒𝚎𝚌𝚑 𝚗𝚊𝚠𝚛𝚘𝚌𝚔𝚒 (Feb 15 2025 at 17:45):

@Sebastian Graf I haven't tried thinking about the details of your proposed approach, but it looks well thought-out. The issues we ran into in the past were really quite basic:

do notation does not provide a way to state and prove loop invariants. It seems that adding this is a foundational component of your strategy.
With SatisfiesM specifically, we can prove p arg a where arg is the input, and a : α is the value produced by M α, but there is no way to reason about other cases, e.g. failure in Except: we may only show that if the result is .ok v then p arg v, but not that if the result is .err e then q arg e. Perhaps building except into your TransStack accounts for this?

allow convenient intrinsic F*/Dafny/Verus-style proofs embedded into do blocks by elaboration into the extrinsic framework

I am also curious about this statement. My current model of verified programming in DTT is that certain intrinsically-verified programs cannot be expressed extrinsically. Here is the idea:

def tizio : Id Nat := do
  for i in [0:n] with invariant P do -- imagined syntax
    ..
  have h : P (n - 1) := ..
  let elem := array[i]'(by simp [h])

In this program, the executable code includes an array-bound proof term derived from proofs about earlier parts of the program. I don't immediately see how this could be expressed extrinisically (i.e., write the code first and the prove it correct as a separate theorem), and I suspect that it cannot.

Henrik Böving (Feb 15 2025 at 18:16):

Hypothetically speaking it should be possibel to generate a function tizio_aux that contains the old body of tizio but also a proof argument that is abstracted over the necessary hypotheses and plugs these hypothesis in at the appropriate array spots, then redefine tizio in terms of tizio_aux <your_external_proof_term> right? I'm not saying that this is convenient but at least a technical possibility and potentially automatable.

Devon Tuma (Feb 15 2025 at 19:14):

Me and @Quang Dao would have a definite use case for this in verifying cryptographic code (see here). Having a standardized program logic would be very useful to simplify proofs that currently require ad-hoc reasoning.

Specifically to your questions, in our use case we've been writing algorithms using a relatively simple monad stack, the main definition being a type to represent computations with oracles, which we use for uniform selection, random oracles, security game oracles, etc. More explicitly our main stack is:

inductive FreeMonad (f : Type u → Type v) (α : Type w) : Type (max (u + 1) v w)
  | protected pure (x : α) : FreeMonad f α
  | roll {β : Type u} (x : f β) (r : β → FreeMonad f α) : FreeMonad f α

-- Map from indexing set to domain/range type of that oracle
def OracleSpec (ι : Type u) : Type (max u (v + 1)) :=
  (i : ι) → Type v × Type v

-- Functor to represent queries with a given output type
inductive OracleQuery {ι : Type u} (spec : OracleSpec.{u,v} ι) :
    Type v → Type (max u v)
  | query (i : ι) (t : spec.domain i) : OracleQuery spec (spec.range i)

-- Most definitions are written in terms of this monad
def OracleComp {ι : Type u} (spec : OracleSpec.{u,v} ι) :
    Type w → Type (max u (v + 1) w) :=
  OptionT (FreeMonad (OracleQuery spec))

Operational semantics are essentially just the universal properties of OptionT/FreeMonad, letting us lift a mapping from OracleQuery to a new monad m to one from OracleComp to m. The monads you listed seem to cover all our use cases of this:

structure QueryImpl (spec : OracleSpec ι) (m : Type u → Type v) where
  impl {α : Type u} (q : OracleQuery spec α) : m α

def simulateQ [Monad m] [Alternative m]
    (so : QueryImpl spec m) : OracleComp spec α → m α :=
  OptionT.mapM' (FreeMonad.mapM' m so.impl)

-- Example responding to repeated queries with a cache using StateT
def cachingOracle [spec.DecidableEq] :
    QueryImpl spec (StateT spec.QueryCache (OracleComp spec)) where
  impl | query i t => do match (← get) i t with
    | Option.some u => return u
    | Option.none => QueryCache.queryFresh i t

Denotational semantics are just the case of mapping this into the PMF monad (actually OptionT PMF to represent a spmf):

noncomputable def evalDist [spec.FiniteRange] {α : Type u} :
    OracleComp spec α → OptionT PMF α :=
  simulateQ ⟨fun | query _ _ => OptionT.lift (PMF.uniformOfFintype _)⟩ oa

noncomputable def probOutput (oa : OracleComp spec α) (x : α) : ℝ≥0∞ :=
  (evalDist oa).run (some x)

I do think a relational program logic would be nice for our case, although the unary case would also be helpful.

Michael Sammler (Feb 17 2025 at 07:44):

Hi, this sounds quite interesting. I am currently not working with Lean, but I am quite interested in using it for Iris-based verification in the future.
We recently presented a modular method for building program logics at POPL: https://dl.acm.org/doi/pdf/10.1145/3704847
Our approach is based on the free monad (or more concretely its implementation in Rocq in the form of interaction trees) and it allows one to modularly combine different effects in a program and reuse proofs. In particular, I think this approach provides a higher degree of modularity and reuse than an Dijkstra-monad-based approach.
Our target was to modularly build program logics for Iris and I would expect that one wants to use a similar approach if Iris gets ported to Lean. But I have not thought about general monadic programs, so maybe the tradeoffs are a bit different there.
Maybe some ideas from that paper could also be useful for what you have in mind. Let me know if you have any questions about this paper. I would be happy to talk more about this.

Sebastian Graf (Feb 17 2025 at 09:24):

Wojciech Nawrocki said:

Sebastian Graf I haven't tried thinking about the details of your proposed approach, but it looks well thought-out. The issues we ran into in the past were really quite basic:

do notation does not provide a way to state and prove loop invariants. It seems that adding this is a foundational component of your strategy.

Thanks! Indeed. Stating and proving loop invariants is just "the specification lemma" for forIn that I called Triple.forIn_list above.

With SatisfiesM specifically, we can prove p arg a where arg is the input, and a : α is the value produced by M α, but there is no way to reason about other cases, e.g. failure in Except: we may only show that if the result is .ok v then p arg v, but not that if the result is .err e then q arg e. Perhaps building except into your TransStack accounts for this?

Yes; the PostCond type contains a failure "continuation" for every ExceptT in the stack. That said, this is not the only viable design. I think that if the one-specification-monad-to-rule-them-all SatisfiesM would have worked out, we could encode the same information syntactically rather than semantically via the semantics of tryCatch.
So something like SatisfiesM (tryCatch (.ok <$> e) (.error \o pure)) (fun | .ok a => p a | .error e => q e) would work (tryCatch is used to "interpret" ExceptT here).

allow convenient intrinsic F*/Dafny/Verus-style proofs embedded into do blocks by elaboration into the extrinsic framework

I am also curious about this statement. My current model of verified programming in DTT is that certain intrinsically-verified programs cannot be expressed extrinsically. Here is the idea:
def tizio : Id Nat := do
  for i in [0:n] with invariant P do -- imagined syntax
    ..
  have h : P (n - 1) := ..
  let elem := array[i]'(by simp [h])
In this program, the executable code includes an array-bound proof term derived from proofs about earlier parts of the program. I don't immediately see how this could be expressed extrinisically (i.e., write the code first and the prove it correct as a separate theorem), and I suspect that it cannot.

By "intrinsic" above I was referring to taking a function that would type-check all on its own, and then augmenting its definition with requires, ensures, assert and invariant syntax to specify a contract intrinsically. I would hope that the ensuing verification conditions (predicate entailments) would be discharged by e.g. grind for maximal convenience.
This would be equivalent to implementing a post-processing step that extracts requires, ensures, asserts etc. and turns them into an extrinsic specification that can be proven entirely by a fixed set of tactics.

In your example, the function wouldn't type-check without the array-bounds proof, so that would not fall within the "intrinsic specification" use case.
On a related note, while True and partial def is unlikely to be supported. Perhaps the definitions provided by the partial_fixpoint work can be utilized, but that is yet hard to anticipate.

Sebastian Graf (Feb 17 2025 at 09:47):

Devon Tuma said:

Me and Quang Dao would have a definite use case for this in verifying cryptographic code (see here). Having a standardized program logic would be very useful to simplify proofs that currently require ad-hoc reasoning.

Specifically to your questions, in our use case we've been writing algorithms using a relatively simple monad stack, the main definition being a type to represent computations with oracles, which we use for uniform selection, random oracles, security game oracles, etc.

Indeed, that should work nicely. Can you give an example of a relational property you would like to see specified? That is, how would triple need to be generalized for your relational use case? Would something like this help? (NB: $states → Prop is like a set of tupleOf $states)

def reltriple (x : M α) (pre : ($states → Prop)  → Prop) (post : (α → $states → Prop)  → Prop) : Prop := ...

Example for a pre and postcondition modelling non-interference on StateT {secret : Bool, public : Nat} Id:

def pre := fun S => ∀ σ₁ σ₂, S σ₁ ∧ S σ₂ → σ₁.secret = true ∧ σ₂.secret = false ∧ σ₁.public = σ₂.public
def post := fun R => ∀ α₁ α₂ σ₁ σ₂, R α₁ σ₁ ∧ R α₂ σ₂ → α₁ = α₂ ∧ σ₁.public = σ₂.public

(The interpretation of these predicates in reltriple must be monotonic.)

The problem with the relational case is that the design space is so large.

I think the definition reltriple above characterises many 2-properties (of the same program), but some relational properties want to relate two different programs with each other, and the above does not allow for that.
There is also the question of whether one wants to relate a code path that throws a particular exception ε with a successful execution returning an α. This affects the encoding of PostCond.

I think we might want to hold back on this feature until we have more information on what kind of relational properties people want to express.

Sebastian Graf (Feb 17 2025 at 10:49):

Michael Sammler said:

Hi, this sounds quite interesting. I am currently not working with Lean, but I am quite interested in using it for Iris-based verification in the future.
We recently presented a modular method for building program logics at POPL: https://dl.acm.org/doi/pdf/10.1145/3704847
Our approach is based on the free monad (or more concretely its implementation in Rocq in the form of interaction trees) and it allows one to modularly combine different effects in a program and reuse proofs. In particular, I think this approach provides a higher degree of modularity and reuse than an Dijkstra-monad-based approach.

Great work! That looks like a neat approach if you want to reason about your own bespoke object language.
Though a small nit: Judging from the non-structural recursion in the application case in Fig. 9 I think I would have referred to the approach as "coinductive big-step" rather than "denotational".
On the other hand, this makes me realize that my use "denotational" in the OP could perhaps be weakened to "coinductive big-step". That said, the program logic will only be useful to reason about calls to the interpreter, and not so much for reasoning about expressions in the object language; that would require bespoke specification lemmas and perhaps an entire tactic stack on top. At this point, I'm unsure whether the program logic is so useful at all for this use case...

Just this weekend I had a look at the FreeSpec library which is based on free monads, but they don't really apply it to a language with an interesting semantics...

Regarding Iris: the goal of the planned framework is to provide a program logic for reasoning about programs written in Lean as shallowly-embedded object language. It is not currently a goal to be forward compatible with every future project, especially not if it is not about verifying programs written in Lean.
The program logic of Iris (in contrast to just its logic of bunched implications implemented in iris-lean) in particular is geared towards operational semantics of "deeply-embedded" languages.

I think the main limiting factor when it comes to reusing the planned framework for custom, non-standard monads is that the shape of pre and postconditions must be encodable in a fixed language that is currently targeted towards stacks of StateT s and ExceptT ε. I'm not sure if this is expressive enough for general free monads or interaction trees.

Michael Sammler (Feb 17 2025 at 11:22):

Thanks for your thoughts on this!

Though a small nit: Judging from the non-structural recursion in the application case in Fig. 9 I think I would have referred to the approach as "coinductive big-step" rather than "denotational".

I guess it depends on the view point and we also had some discussion about whether to use "denotational" or some other term when writing the paper. We landed on "denotational" since I would argue that one can view InteractionTrees as a semantic domain that we denote our expressions into, but one can argue about it. (In fact, I would see the non-structural recursion you mention as an argument for calling the approach denotational since we denote recursion into a recursion combinator of the semantic domain instead of doing the fixpoint ourselves, but I guess this is a matter of perspective.)

That said, the program logic will only be useful to reason about calls to the interpreter, and not so much for reasoning about expressions in the object language; that would require bespoke specification lemmas and perhaps an entire tactic stack on top.

I should clarify: The Program Logics a la Carte paper has two parts that can be separated: First, it provides a program logics for effectful programs (in the form of interaction trees) that provides nice modular reasoning principles. Second, it shows how use can use this program logic to build programs logics for object-level languages for it. For your purposes, only the first part is really relevant and I think you can ignore all parts about the object level languages. (This might be interesting to look at in the future, but probably not in the initial step.)

Just this weekend I had a look at the FreeSpec library which is based on free monads, but they don't really apply it to a language with an interesting semantics...

At least from my perspective, interaction trees have superseeded the FreeSpec library as the canonical implementation of free monads in Rocq, so I would look at them instead. There have been quite a few interesting semantics formalized using interaction trees at this point (see e.g. the paper).

The program logic of Iris (in contrast to just its logic of bunched implications implemented in iris-lean) in particular is geared towards operational semantics of "deeply-embedded" languages.

Just to make sure that we are one the same page: The Program Logics a la Carte paper proposes to replace this program logic part of Iris that is geared towards operational semantics of deeply-embedded languages with the denotational approach to interaction trees. I think the denotational way has significant advantages (in particular wrt. modularity) and if Iris gets ported to Lean, it might make sense to port the denotation approach to build a program logic instead of the old operational semantics based one.

I think the main limiting factor when it comes to reusing the planned framework for custom, non-standard monads is that the shape of pre and postconditions must be encodable in a fixed language that is currently targeted towards stacks of StateT s and ExceptT ε. I'm not sure if this is expressive enough for general free monads or interaction trees.

This is actually something something that surprised me when working on that paper, but the approach we use there and which is not too different from what you are proposing, scales to a surprisingly large number of effects. In the paper, we support the following effects among others: state, non-determinism, undefined behavior, aborting, concurrency, I/O with trace properties, non-termination. However, some of these effects rely on the expressiveness of Iris (e.g. non-termination relies on step-indexing), but maybe it is possible to achieve a similar large set of effects for your approach.

It is not currently a goal to be forward compatible with every future project, especially not if it is not about verifying programs written in Lean.

This makes a lot of sense and I fully agree with this goal. To be clear, I don't want to argue that it makes sense to directly use the approach from the paper, instead I just want to share my perspective as someone who has worked on similar problems in the past and hope that this is useful for finding a solution that works well for Lean.

Sebastian Graf (Feb 17 2025 at 11:46):

Thanks again for your reply! I will definitely have a closer look at the first part of your paper then.
It is pretty likely that interaction trees are out of scope as an out-of-the-box instance of MonadTriple because there currently is no formalisation of interaction trees in Lean's standard library. However, I would hope that your paper will guide me to a framework which could be instantiated for a user-written interaction tree library.

Son Ho (Feb 17 2025 at 11:52):

Hi, this would be really useful for the Aeneas project that I'm working on (in collaboration with Jonathan Protzenko and others) where we do extrinsic proofs about pure models generated from Rust code and which live in an error monad:

inductive Result (α : Type u) where
  | ok (v: α): Result α
  | fail (e: Error): Result α
  | div -- for partial_fixpoint

We had to put a fair amount of work into writing a progress tactic to do proofs in a Hoare-logic style and which is a bit equivalent to CFML's xapp, though in our case we are not concerned about framing separation logic specifications but rather about automatically discharging preconditions.

For now, because we only work with one monad and do not specify failing or divergent executions (we prove total correctness theorems) we write our theorems in a very specific shape (progress is mostly syntax driven - I would like something more general of course). For instance, this is the theorem to reason about addition about U32 machine integers and that progress knows how to manipulate (it automatically looks it up whenever seeing a call to +):

@[progress] -- saves the theorem in the relevant database
theorem U32.add_spec {x y : U32}
  (hmax : (↑x : Int) + ↑y ≤ U32.max) : -- precondition
  -- there is generally a list of preconditions, and which live in different theories (i.e., require different solvers)
  ∃ z, x + y = ok z ∧ -- there are usually several existentially quantified variables
  (↑z : Int) = ↑x + ↑y -- post-condition
  := by ....

This is a specific class of proofs, but quite informative of what we need in practice, I believe, and I have a few examples of proofs that you can have a look at. For instance, there is the tutorial which showcases proofs about list-manipulating functions and a (minimal) big num library. We also have proofs about an AVL tree and a hashmap (the latter one is quite old and needs deep cleaning). I'm currently working on the verification of a Rust implementation of the ML-KEM post-quantum cryptographic primitive (not public yet).

I believe there are quite a few things that need improvement and that would benefit from a more systematic approach.

For now, we only work with an error monad but in the future we will have to work with code which lives in more expressive monads to reason about some form of aliasing, I/O, or concurrency. An extremely useful feature would be a way of seamlessly switching between monads, for instance by having a way of lifting theorems (from, say, an error monad to a state error monad). Also, I mentioned that we currently only prove total correctness theorems, but users of the framework may also want to prove partial correctness theorems. It would be nice to have a way of easily switching between partial correctness and total correctness proofs, in particular by factoring theorems out. For instance, it should be possible to write a correctness theorem about + which specifies both the ok and fail cases, and use it in both partial and total correctness proofs, in the latter case by using the specification of fail as a precondition that has to be disproved.
It is currently not extremely convenient to reason about sub-expressions; the problem of loops has been mentioned above, but I also need a way of easily reasoning about expressions like this (today progress is good at reasoning about function calls):

let i1 ← if b then ok i else i + 1
...

(those are maybe more of technical details at this stage, but I think it's important to mention them to inform about the proof experience) so far, most of the work has been put into writing the progress tactic so that the proofs are smooth, in particular when it comes to discharging pre-conditions. This tactic uses several simple, yet extremely useful tactics (simp, a tactic which looks a bit like assumption but which uses more clever heuristics to instantiate meta-variables, and a solver based on omega) - I can share a document about this if you are interested. Those tactics are currently hardcoded, but in the future we want them to be customizable: one non-obvious problem here is how to quickly figure out which solver should be applied on which precondition (we can't just try them one after the other as they tend to be expensive). It can also happen that we want to switch between different flavours of "spec" theorems depending on the program we verify. For instance I mentioned the problem of partial vs total correctness above. Another example is that in most situations today we want to use the + theorem I showed above which specifies addition in terms of Int, but when reasoning about a function performing bitwise operations we want to systematically lookup theorems which specify arithmetic operations in terms of bitvector operations, etc.

Joachim Breitner (Feb 17 2025 at 13:01):

Son Ho said:

do not specify failing or divergent executions (we prove total correctness theorems) we

You man “partial correctness”, right?

Son Ho (Feb 17 2025 at 13:13):

Joachim Breitner said:

You man “partial correctness”, right?

I was slightly imprecise. I believe a partial correctness theorem would go like this:

forall x y, f x = ok y -> post y

In our case we prove panic freedom which means our theorems are total correctness theorems I think:

forall x, pre x -> exists y, f x = ok y /\ post y

But because here fail actually models a panic in Rust, and as it is possible to recover from panic in some situations, we may want to prove theorems of the following shape:

forall x,
match f x with
| ok y -> post y
| fail -> not (pre x) -- something like this probably?

Joachim Breitner (Feb 17 2025 at 14:21):

Ah, yes, that looks like total correctess to me. I misread the prose then.

Sebastian Graf (Feb 17 2025 at 15:10):

Thanks, Son Ho, your pointer to Aeneas' progress tactic and its examples will be incredibly valuable. I think I'm already starting to encounter similar challenges in the examples I'm looking at (without a tactic such as progress). The Result type seems entirely reasonable to instantiate triple for.

more expressive monads to reason about some form of aliasing, I/O, or concurrency

As alluded to in #Program verification > Monadic program logic: Request for feedback @ 💬, I have played around with an encoding of separation logic, but realized that you would need to somehow restrict the kinds of computations that end up in the heap to a closed universe, so that its entries can have Type 0 rather than Type 1 for the general Σ α, α. For this reason, I do not yet have plans to include a monad transformer adding a separation logic-based heap out of the box, but since the syntax of predicates is the same as for StateT, we could nurture such a transformer in a separate library first. I do not yet know how to model concurrency in this framework appropriately.

This tactic uses several simple, yet extremely useful tactics [...]. I can share a document about this if you are interested.

I would be very interested in that. It is something that I have been wondering as well, but put on my backlog because it is impossible to solve without seeing concrete instances, which I hopefully will see soon.

Son Ho (Feb 17 2025 at 15:22):

Sebastian Graf said:

I would be very interested in that.

I wrote this down in my PhD dissertation, chapter 14.3.3 "Hoare-Logic Style Proofs and the progress Tactic". I will be very happy to provide additional feedback if you have questions.

Son Ho (Feb 17 2025 at 15:35):

Sebastian Graf said:

but since the syntax of predicates is the same as for StateT, we could nurture such a transformer in a separate library first

Yes, at this stage I'm also not sure what's the best way to proceed and what I will actually need either. But I intend to study the problem of verifying stateful code (i.e., code which I can't model only with an error monad) later this year: this should provide interesting insights about how to combine these kinds of reasonings.

Mario Carneiro (Feb 17 2025 at 23:28):

Sebastian Graf said:

Regarding Iris: the goal of the planned framework is to provide a program logic for reasoning about programs written in Lean as shallowly-embedded object language. It is not currently a goal to be forward compatible with every future project, especially not if it is not about verifying programs written in Lean.
The program logic of Iris (in contrast to just its logic of bunched implications implemented in iris-lean) in particular is geared towards operational semantics of "deeply-embedded" languages.

I think the main limiting factor when it comes to reusing the planned framework for custom, non-standard monads is that the shape of pre and postconditions must be encodable in a fixed language that is currently targeted towards stacks of StateT s and ExceptT ε. I'm not sure if this is expressive enough for general free monads or interaction trees.

My inclination regarding this is that you should take care to separate the syntax of do notation programs with invariants from the interpretation of those programs and specifications wrt lean shallow embedding. I agree that lean shallow embedded programs should be the main focus, but it should be possible to leverage some parts of the frontend to do similar things even if the specification language is something fancy.

This is more or less what I mean by "working together with iris-lean": it should be possible for iris-lean to present an interface that feels native to the greatest extent possible. I don't want to tie your hands too much here, but please don't do like do did and make a completely non-extensible system (and leave it unmaintained for a long time). I'm sure you weren't really thinking of doing so anyway, but I want to stress that extensibility and pluggability of the architecture is very much a use case that matters to me.

Wrenna Robson (Mar 20 2025 at 18:03):

@Sebastian Graf I just found this thread. I am literally right now trying to verify some code that uses do notation and running into the issue that I don't have for loop pre- and post-conditions! (In particular, it makes working with mutable arrays where I want to prove facts about their size at the end of the loop quite tricky.)

Wrenna Robson (Mar 20 2025 at 18:04):

In general I am very interested in this work.

Sebastian Graf (Mar 20 2025 at 18:17):

You could give my prototype a try; here's an example for proving a loop-based fibonacci function correct: https://github.com/sgraf812/mpl/blob/682c1f7ef8dfee8f1b565338c5a8f3c80d6442f1/Tests/Toy.lean#L284.
Dislaimer: Not "ready for production" yet.

What kind of code are we talking about? What kind of mutable arrays in particular? Is it the local kind of mutability, i.e. just let mut arr : Array Blah := ...?

Wrenna Robson (Mar 23 2025 at 07:19):

Yes, it's nothing particularly fancy. Though it uses vectors rather than arrays which makes the types more of a headache.

Sebastian Graf (Apr 25 2025 at 14:24):

I think the dust on the semantic foundations and the general UX vision for manual proofs has settled.
It's time to elicit a bit of guinea pig feedback! You can depend on mpl by adding the following requires to your lakefile.toml:

[[require]]
name = "mpl"
git = "https://github.com/sgraf812/mpl"
rev = "pre-release"

Alternatively, you can just clone the repository and play with the files in Test/, such as Test/Toy.lean.

Bear in mind that this not an official release! This library is not ready for production use. Please do not emphatically share this request. The automation is still hacky and underdeveloped and subject to change.

I would be very happy for you to import MPL, give it a go and let me know about the general proof mode UX (examples in Test/Toy.lean). I am specifically interested in your opinion on

The semantic foundations. Hoare triples are encoded based on a notion of predicate transformers that is parameterized by a PostShape parameter. Batteries are included for the typical transformer primitives (StateT, ExceptT, Id, a better-behaved Idd, IO). Would you say this is enough for your use case?
The Iris-like proof mode for manipulating stateful hypotheses in a pointfree way, one that does not require repeated introduction of state variables. (Check out Test/ProofMode.lean for a comprehensive set of small examples.)
The prototypical syntax in the README.md that extends def-elaboration with support for intrinsic verification features (requires, ensures, invariant). It bears repeating: You will find the proof automation lacking once you type in more complicated examples; this is mostly about the general look and feel for now. You have to import MPL.Experimental to opt into that feature.

Next up on my agenda is writing an actual verification condition generator that replaces the simp-based mwp tactic and grow that in parallel with small to medium-sized verification projects. We were thinking about a congruence closure algorithm; any other ideas?

Mario Carneiro (Apr 27 2025 at 20:56):

https://github.com/sgraf812/mpl

Bas Spitters (May 01 2025 at 18:28):

@Sebastian Graf this looks nice. I have yet to get a full overview. Are you planning a write-up?
The automatic proofs of equivalence between imperative and functional code are nice.
We added them in the Rocq backend for Hax some time ago (for the state monad):
https://eprint.iacr.org/2023/185

Last updated: May 02 2025 at 03:31 UTC