Zulip Chat Archive

Stream: std4

Topic: tailrec pmap


James Gallicchio (Jul 04 2023 at 16:36):

I am using pmap in one of my projects that has a relatively large list constant. Since pmap is not tailrec it causes eval to stack overflow. I know this is one of the functions that should be compiled to the identity function, but thoughts on adding a tailrec implementation with a csimp lemma?

James Gallicchio (Jul 04 2023 at 16:43):

std4#176 in case it is desired

Mac Malone (Jul 04 2023 at 20:52):

@James Gallicchio This inspired me and led me to a generalization of pmap. Like ite vs dite, we can define a memMap function that carries the knowledge that each mapped a is a \mem l and then pmap is just a specific application of that function. Below is a PoC. I am curious what you think of it.

namespace List

/--
`memMap` is like `map` but with a proof that `f` is only applied to elements
of the list.
-/
@[specialize] def memMap (l : List α) (f : (a : α)  a  l  β) :  List β :=
  match l with
  | [] => []
  | a::as =>
    f a (mem_cons.mpr <| .inl rfl) ::
    memMap as fun a h => f a (mem_cons.mpr <| .inr h)

/-- Tail-recursive version of `memMap`. -/
@[inline] def memMapTR (l : List α) (f : (a : α)  a  l  β) :  List β :=
  loop l f []
where
  @[specialize] loop (l : List α) (f : (a : α)  a  l  β) (acc : List β) : List β :=
    match l with
    | [] => acc.reverse
    | a::as =>
      loop as
      (fun a h => f a <| mem_cons.mpr <| .inr h)
      (f a (mem_cons.mpr <| .inl rfl) :: acc)

@[csimp] theorem memMap_eq_memMapTR : @memMap = @memMapTR := by
  funext α β l f
  unfold memMapTR
  suffices  acc, acc.reverse ++ memMap l f = memMapTR.loop l f acc by
    have := this []
    simp at this
    exact this
  intro acc
  induction l generalizing acc with
  | nil => simp [memMap, memMapTR.loop]
  | cons x xs ih =>
    simp [memMap, memMapTR.loop, ih]

@[inline] def pmap {p : α  Prop} (f : (a : α)  p a  β) (l : List α) (h :  a  l, p a) : List β :=
  l.memMap fun a mh => f a (h a mh)

Mac Malone (Jul 04 2023 at 20:57):

I guess memMap could also be called dmap to fit with dite, but I am not sure if that is appropriate.

Mario Carneiro (Jul 04 2023 at 20:57):

pmap, memMap and attach are all inter-definable

Mario Carneiro (Jul 04 2023 at 20:59):

performance-wise, I guess we should bias toward using attach, since that one is always O(1) while pmap is more like a map equivalent

Mac Malone (Jul 04 2023 at 21:00):

@Mario Carneiro Oh, I did not see attach. Yeah, everything should probably be phrased with attach as that can be @[extern] as id.

Mario Carneiro (Jul 04 2023 at 21:00):

it can also be implemented_by as unsafeCast

Mac Malone (Jul 04 2023 at 21:02):

It would be neat if the new compiler could have @[id] attribute that works like unsafeCast/@[extern] does a type-safe check to ensure the data layouts are, in fact, identitical.

Mario Carneiro (Jul 04 2023 at 21:02):

if it gets smart enough to do that, it may as well just do the optimization without annotation

Mac Malone (Jul 04 2023 at 21:05):

@Mario Carneiro @[id] would just require checking the function type signatures (at a data level), the compiler may not have enough information to determine the if the function is actually a no-op. Furthermore, it like e.g. @{musttail] would allow for a quick user-level check that an optimization is occurring as expected.

Mac Malone (Jul 04 2023 at 21:09):

@James Gallicchio from Mario's note, it seems like the best implementation of pmap is thus:

@[inline] def pmap {p : α  Prop} (f :  a, p a  β) (l : List α) (h :  a  l, p a) : List β :=
  l.attach.map fun a, mh => f a (h a mh)

Mac Malone (Jul 04 2023 at 21:10):

That way we get the tail-recursion for free (and the function is essentially identical to just map).

James Gallicchio (Jul 04 2023 at 21:11):

oh totally. are we cool with just having the unsafe implementedBy in Std? cuz I'll PR that instead in a heartbeat :P

Mac Malone (Jul 04 2023 at 21:12):

It is used a lot in core, so I would assume it is legal in Std?

Mac Malone (Jul 04 2023 at 21:19):

@Mario Carneiro?

Mario Carneiro (Jul 04 2023 at 21:19):

yup

Mario Carneiro (Jul 04 2023 at 21:19):

obviously any implemented_by need to be sound

James Gallicchio (Jul 04 2023 at 21:20):

will PR

Mario Carneiro (Jul 04 2023 at 21:20):

note, what mac has written is the implementation of pmap, not the definition of pmap which should remain as it is

Mario Carneiro (Jul 04 2023 at 21:21):

i.e. both pmap and attach will get implemented_by, with attach implemented as id and pmap implemented using map and attach

James Gallicchio (Jul 04 2023 at 21:22):

we could do a csimp lemma for pmap since it's provably equivalent

Mario Carneiro (Jul 04 2023 at 21:22):

true

Mac Malone (Jul 04 2023 at 21:25):

Mario Carneiro said:

note, what mac has written is the implementation of pmap, not the definition of pmap which should remain as it is

I am curious about this Mario. What makes attach + map less useful for proofs than pmap's current implementation? I would think they would be pretty similar when unfolded.

James Gallicchio (Jul 04 2023 at 21:27):

attach just unfolds less nicely iirc

James Gallicchio (Jul 04 2023 at 21:28):

like we probably still want a simp lemma that turns it into a pmap

Mario Carneiro (Jul 04 2023 at 21:29):

yeah, pmap has a better recursive structure for reasoning since p and f can stay fixed

Mario Carneiro (Jul 04 2023 at 21:29):

attach is just horrible to work with recursively, you have to cast it on every iteration

Mac Malone (Jul 04 2023 at 22:02):

Ah, make sense then. :thumbs_up: I would have thought there was some way to easily simp away differences, but apparently not.

James Gallicchio (Jul 05 2023 at 01:39):

Pushed a new version. It's a little bit of a hacky order, I define attach in terms of pmap still, but with an unsafe implemented_by, and then I prove a csimp lemma that substitutes attach+map for pmap.

James Gallicchio (Jul 06 2023 at 05:20):

If possible I'd like to get this merged soonish so it's upstreamed to mathlib soonish!


Last updated: Dec 20 2023 at 11:08 UTC