Zulip Chat Archive

Stream: lean4

Topic: simp(rocs) with state


Thomas Murrills (Jul 11 2024 at 06:35):

In metaprogramming, sometimes you'd like to refer to a state while executing simprocs—either to collect things the simproc sees and return them at the end of the procedure, or to refer to a cache the simproc is building up (or which has been built up prior to the simproc, in other actions, to avoid repeating work).

Crucially, you still want to wind up with a proof that the transformed expression is equal to the initial one via proofs of equality for subexpressions, so something like transform is inadequate.

I don't need the full power of simp; persisting state through a simproc-like procedure is enough. (So, it doesn't actually need to be simp-based per se; something like transformWithEqM would even be adequate.)

Is there something shaped like this somewhere already?

(My current approach hinges on managing an IO.Ref during a simproc, which I'd rather avoid! :) )

Tomas Skrivan (Jul 11 2024 at 11:31):

I ran into the same issue and unfortunately also settled on using IO.Ref. Also pattern conv tactic uses simp internally for this exact purpose, so I doubt there is a better solution right now.

It would be interesting to try to turn SimpM monad to a monad transformer SimpT. I was thinking about it a bit but I have no idea how to manage different simprocs living in different monads or lifting them to different ones.

Tomas Skrivan (Jul 11 2024 at 11:40):

But hey simp itself is using StateRefT to manage the state so using IO.Ref is not too bad.

Thomas Murrills (Jul 11 2024 at 23:50):

Hmm, did you run into any sharing issues? Currently my IO.Ref contains an Option A, and within the simproc I do the following dance to prevent the state from being shared:

let some s  stateRef.modifyGet fun s? => (s?,none) | throwError "no state"
let (r,s)  k.run s
stateRef.set s
return .done r

where k is the monad action performing the real work in my stateful monad, and caches things in s.

I can't help but wonder if this shuffling is suboptimal...! :)

Thomas Murrills (Jul 11 2024 at 23:51):

Oh—maybe I could feed the ref directly to StateRefT' instead of using StateRefT'.run? (Which I realized right after sending the above message, as is always the way... :) )

Mario Carneiro (Jul 11 2024 at 23:54):

I don't understand your setup. You can just call IO.mkRef to create a ref, then modify it in the simproc

Mario Carneiro (Jul 11 2024 at 23:54):

any monad extending IO has this capability

Thomas Murrills (Jul 11 2024 at 23:55):

Not if I'm relying on the state of other monads, like AtomM, right?

Mario Carneiro (Jul 11 2024 at 23:56):

you would have to save/restore that state as well

Thomas Murrills (Jul 11 2024 at 23:57):

That is, this is in a setup where you have monads with state that you'd like to run the simproc in (and definitions written in those monads), so going through and duplicating the definitions to have a version which modifies the ref directly is a whole lot of extra code

Mario Carneiro (Jul 11 2024 at 23:57):

I don't see why any duplication is necessary

Thomas Murrills (Jul 11 2024 at 23:57):

Mario Carneiro said:

you would have to save/restore that state as well

Right, which is what I'm doing with s, no? Am I misinterpreting?

Mario Carneiro (Jul 11 2024 at 23:58):

note that there is a general mechanism for saving and restoring state to pass it through another monad, namely MonadControlT

Thomas Murrills (Jul 11 2024 at 23:59):

Let's say I have a preexisting definition foo : Expr -> AtomM Expr and I want to run this in a simproc. What are you saying is the more obvious way to do this?

Thomas Murrills (Jul 12 2024 at 00:00):

(Specifically, with AtomM state persisting through the simproc)

Mario Carneiro (Jul 12 2024 at 00:01):

you probably need to implement MonadControl for AtomM, but the basic idea is to get the state from the ref, run the function, then store the resulting state back in the ref

Mario Carneiro (Jul 12 2024 at 00:03):

i.e. exactly what you are doing already, just packaged into a function

Thomas Murrills (Jul 12 2024 at 00:03):

Gotcha, ok :)

Thomas Murrills (Jul 12 2024 at 00:18):

Re: performance concerns: currently my setup involves initializing this special persisting ref, since I can't see a way to mkRef then feed it to a simproc. I figure anything initialized ought to be small, so I'm considering either

  • having IO.Ref (Option A), and unpacking the data each time as shown above
  • having an IO.Ref A and initializing it with some opaque value? (Not sure if this would do what I hope)
  • just initializing IO.Ref A with some basic value if this is overkill

and likewise I'm not sure if I should have separate refs storing the read-only context and the state, or bundle them in one ref and unpack them each time it's run. How expensive is reading from a ref vs. unpacking things, I wonder?

Mario Carneiro (Jul 12 2024 at 00:28):

I don't think the size of initialized objects is a concern, it's just a pointer regardless

Mario Carneiro (Jul 12 2024 at 00:30):

the bigger issue is the semantics (this is a global variable shared with any concurrent calls in the same process) and the locking overhead associated with atomically writing to a shared variable

Thomas Murrills (Jul 12 2024 at 00:32):

Hmm, does that suggest that taking the state out of the ref as I initially planned is better than feeding the ref directly to StateRefT'? Because then the global ref is only locked during the writes at the beginning and end?

Mario Carneiro (Jul 12 2024 at 00:33):

what do you mean by that?

Mario Carneiro (Jul 12 2024 at 00:33):

you can't easily lock an IO ref for a long time, the main way is to use modify with a pure function that takes a while

Mario Carneiro (Jul 12 2024 at 00:33):

in particular it's not really deadlock prone IIUC

Thomas Murrills (Jul 12 2024 at 00:34):

So, as an example of "feeding StateRefT' the ref", if I have k : StateRefT' ..., then evaluate k myGlobalRef, what happens? (Without run')

Mario Carneiro (Jul 12 2024 at 00:35):

If you can feed the ref via StateRefT' then this is definitely preferred, because then it won't be a global variable

Mario Carneiro (Jul 12 2024 at 00:35):

...oh, well that doesn't do much, that's just a complicated way of using the ref directly

Mario Carneiro (Jul 12 2024 at 00:36):

run' is just syntax around application

Mario Carneiro (Jul 12 2024 at 00:36):

and StateRefT' is just the reader monad over an IO.Ref

Thomas Murrills (Jul 12 2024 at 00:36):

Well, run' makes a fresh ref with mkRef, right? I was wondering if something different would happen if I didn't use a fresh ref, but used a global one (i.e. the one I initialized)!

Mario Carneiro (Jul 12 2024 at 00:37):

oh you are right, run' creates the ref

Mario Carneiro (Jul 12 2024 at 00:38):

if you just apply it to a global then it is a reader monad like I said

Thomas Murrills (Jul 12 2024 at 00:39):

Gotcha, ok—and just so I don't get my wires crossed this (:= applying it to a global ref) is better than extracting the data from the global ref, running StateRefT', then putting the resulting state back in the global ref (as in my initial message)?

Thomas Murrills (Jul 12 2024 at 00:41):

(It's a bit weird for me to think of StateRefT' applied to a global ref as a reader monad; I'm kind of imagining the StateRefT' as doing all sorts of writes to that global ref as it executes, and this makes me worry about the concurrency you mentioned)

Mario Carneiro (Jul 12 2024 at 00:44):

Thomas Murrills said:

Gotcha, ok—and just so I don't get my wires crossed this (:= applying it to a global ref) is better than extracting the data from the global ref, running StateRefT', then putting the resulting state back in the global ref (as in my initial message)?

No, they are the same

Mario Carneiro (Jul 12 2024 at 00:45):

It's a reader monad because all it does is give the code access to this IO.Ref variable, which it can use to call things like get and set

Mario Carneiro (Jul 12 2024 at 00:45):

the data isn't "in" the IO.Ref, you should just think of that as a pointer to the actual mutable memory location

Mario Carneiro (Jul 12 2024 at 00:46):

and if it's a global variable then there is no need to thread access around because you can just get the value directly

Thomas Murrills (Jul 12 2024 at 00:50):

Hmm, ok! That’s useful!

Thomas Murrills (Jul 12 2024 at 00:50):

Mario Carneiro said:

If you can feed the ref via StateRefT' then this is definitely preferred, because then it won't be a global variable

What were you referring to here, though?

Mario Carneiro (Jul 12 2024 at 00:53):

if you have control over the call to Simp.main then you can create the ref outside the scope of the call

Mario Carneiro (Jul 12 2024 at 00:54):

It's still possible to keep your state segregated without this but it's more complicated

Mario Carneiro (Jul 12 2024 at 00:54):

squeeze_simp implements this by using a global ref with a map from syntax locations to data

Mario Carneiro (Jul 12 2024 at 00:55):

so that even if two simps run at the same time they can access the right state

Mario Carneiro (Jul 12 2024 at 00:56):

if it's just a cache then you might be able to just have it global and not worry about it, but you have to ensure that e.g. contexts line up

Tomas Skrivan (Jul 12 2024 at 01:07):

Here is a basic setup I would use, but I'm not sure if it addresses all your concerns especially about running this in parallel

import Lean
import Qq

open Lean Meta Qq

def mySimproc (counter : IO.Ref Nat) (pre:=true) : Simp.Simproc := fun e => do
  counter.modify (fun n => n + 1)
  if pre then
    return .continue
  else
    return .done { expr :=  mkAppM ``id #[e]  }

#eval show MetaM Unit from do
  let e := q(fun x : Nat => let y := (x+0)*x; x+y)
  let preCounter  IO.mkRef 0
  let postCounter  IO.mkRef 0
  let (r,_)  Simp.main e {} {} {pre := mySimproc preCounter, post := mySimproc postCounter false}
  IO.println s!"result {← ppExpr r.expr}"
  IO.println s!"number of pre steps: {← preCounter.get}"
  IO.println s!"number of post steps: {← postCounter.get}"

Mario Carneiro (Jul 12 2024 at 01:08):

Running it like this solves the concerns about running in parallel

Mario Carneiro (Jul 12 2024 at 01:09):

the trouble arises when you use initialize to store the ref(s)

Mario Carneiro (Jul 12 2024 at 01:10):

here "in parallel" is referring to either independent simp calls from the same or different proofs in the same file, or the same simp call in different elaborations of the same file (e.g. after an edit) before the old elaboration has been cancelled

Thomas Murrills (Jul 12 2024 at 01:11):

Ah, ok! I don’t need initialize if I can feed in Simprocs directly like this (as opposed to feeding in their names, like in Meta.simpGoal); I can use Simp.main in my use case!

Tomas Skrivan (Jul 12 2024 at 01:11):

Right that is what you were saying with squeeze_simp that instead of having IO.Ref State for each call you have one global IO.Mutex (HashMap Location State) or something like that.

Mario Carneiro (Jul 12 2024 at 01:12):

right, that's the workaround for a kind of crappy situation

Mario Carneiro (Jul 12 2024 at 01:13):

and it's still not good enough since you can have multiple elaborations of the same file with the same Location

Mario Carneiro (Jul 12 2024 at 01:15):

in short: avoid global mutable state :upside_down:

Thomas Murrills (Jul 12 2024 at 01:30):

(Hmm, I guess one downside to using "manual" simprocs in pre/post like this is the lack of DiscrTree matching that comes with simprocs. But that's not too hard to bundle into the def. For posterity, if you want to match simp behavior: simp uses getMatchWithExtra when matching against simproc DiscrTrees, and elabSimprocKeys to get the DiscrTree keys.)

Notification Bot (Jul 12 2024 at 16:43):

A message was moved from this topic to #new members > Set theory game by Kyle Miller.


Last updated: May 02 2025 at 03:31 UTC