Zulip Chat Archive

Stream: general

Topic: d_array.mmap

Johan Commelin (Apr 14 2020 at 09:31):

This code is (unc)lean. Can someone please tell me how to make it (c)lean?

namespace d_array
universe variables u v w
variables {n : ℕ} {α : fin n → Type u}

lemma push_back_idx {j n} (h₁ : j < n + 1) (h₂ : j ≠ n) : j < n :=
nat.lt_of_le_and_ne (nat.le_of_lt_succ h₁) h₂

def mmap_core {β : fin n → Type v} {m : Type v → Type w} [monad m]
  (a : d_array n α) (f : Π i : fin n, α i → m (β i)) :
  ∀ i (h : i ≤ n), m (d_array i (β ∘ (λ j, ⟨j.1, lt_of_lt_of_le j.2 h⟩)))
| 0 _ := pure d_array.nil
| (i+1) h := do
    bs ← mmap_core i (le_of_lt h),
    b ← f ⟨i, h⟩ (a.read ⟨i, h⟩),
    pure $ { data := λ ⟨j, hj⟩, (if hj' : j = i
    then by { cases hj', exact b }
    else bs.read ⟨j, push_back_idx hj hj'⟩) }

/-- Monadically map a function over the array. -/
def mmap {β : fin n → Type v} {m} [monad m] (a : d_array n α) (f : Π i : fin n, α i → m (β i)) :
  m (d_array n β) :=
have (β ∘ (λ j, ⟨j.1, j.2⟩ : fin n → fin n)) = β, by { funext j, cases j, refl },
by { rw ← this, exact a.mmap_core f _ (le_refl _) }

end d_array

Mario Carneiro (Apr 14 2020 at 09:33):

Hm, mmap_core doesn't look good, it's requilding the whole array on every iteration

Kenny Lau (Apr 14 2020 at 09:33):

can't something transverse solve this?

Kevin Buzzard (Apr 14 2020 at 09:34):

Johan why are you currently interested in stuff like this? Is there some underlying mathematical reason behind it?

Johan Commelin (Apr 14 2020 at 09:35):

No... I'm trying to get rewrite_search into mathlib. This is part of cleaning up the code. And we want the tactic to be fast.

Mario Carneiro (Apr 14 2020 at 09:36):

This looks like a hard function to implement efficiently without type-changing write

Johan Commelin (Apr 14 2020 at 09:36):

I'm maybe not the right person for the job right now... but maybe I can learn to become a reasonably good person for the job :sweat_smile:

Mario Carneiro (Apr 14 2020 at 09:36):

maybe kick it down to C++?

Johan Commelin (Apr 14 2020 at 09:36):

Mario Carneiro said:

maybe kick it down to C++?

Maybe not me?

Mario Carneiro (Apr 14 2020 at 09:38):

Seems like the best case scenario for lean code is you construct a case splitty function by iterating nat.cases_on, and pass it to array.mk

Johan Commelin (Apr 14 2020 at 09:38):

If you want to turn Lean into hello-world spaghetti, then you should ask me to start touching the C++

Kevin Buzzard (Apr 14 2020 at 09:40):

Johan Commelin said:

I'm trying to get rewrite_search into mathlib.

That's a mathematical reason in some sense.

Johan Commelin (Apr 14 2020 at 09:49):

Mario Carneiro said:

Seems like the best case scenario for lean code is you construct a case splitty function by iterating nat.cases_on, and pass it to array.mk

@Mario Carneiro I'm not sure if I know how to do this in an acceptable way, because I really don't know what is fast/allowed and what is slow/bad code...

Gabriel Ebner (Apr 14 2020 at 09:54):

Mario Carneiro said:

case splitty function by iterating nat.cases_on, and pass it to array.mk

That has quadratic runtime. I see two other options:

Define array.mmap (which is easy because we have array.push_back). Then define d_array.mmap by converting d_array n α to array n (sigma α) and back. (Or just don't bother with d_array.mmap at all.)
Define d_array.push_back (and add two characters to vm_array.cpp for a fast VM implementation). Then defne d_array.mmap using d_array.push_back.

Johan Commelin (Apr 14 2020 at 09:55):

If the consensus is that d_array.mmap is not important, I'll try to focus on array.mmap instead (-;

Gabriel Ebner (Apr 14 2020 at 09:56):

Unless you absolutely love working on d_array.mmap I would say go for array.mmap instead.

Mario Carneiro (Apr 14 2020 at 10:01):

It would also be great if there was a way to create an uninitialized array

Mario Carneiro (Apr 14 2020 at 10:02):

Maybe there are tricks that can be done with unchecked cast

Mario Carneiro (Apr 14 2020 at 10:04):

Alternatively, there could be another d_array constructor that uses a state transition function (like A -> B x A) rather than a nat indexed function

Mario Carneiro (Apr 14 2020 at 10:09):

constant d_array.mmk {n : ℕ} {α : fin n → Type u} {m : Type u → Type v} [monad m] :
  (∀ i, m (α i)) → m (d_array n α)

Mario Carneiro (Apr 14 2020 at 10:11):

which makes mmap easy to define:

def mmap {β : fin n → Type v} {m} [monad m] (a : d_array n α) (f : Π i : fin n, α i → m (β i)) :
  m (d_array n β) :=
d_array.mmk (λ i, f i (a.read i))

Reid Barton (Apr 14 2020 at 11:34):

If you could explain what it is you are trying to achieve in non-Lean terminology, I could probably tell you how it is done in GHC. I don't know the difference between d_array and array though.

Johan Commelin (Apr 14 2020 at 11:35):

d_array is dependent array

Reid Barton (Apr 14 2020 at 11:35):

Okay, well obviously those don't exist in GHC.

Mario Carneiro (Apr 14 2020 at 11:47):

I don't think it makes much difference. The main annoying thing is that you can't write a clean push_back on d_array

Mario Carneiro (Apr 14 2020 at 11:48):

My proposal above should have some haskell equivalent: a monadic constructor for array

Mario Carneiro (Apr 14 2020 at 11:49):

the haskell equivalent should have a type like nat -> (nat -> m A) -> m (array A)

Johan Commelin (Apr 14 2020 at 11:50):

I've pushed a commit that ditches d_array.(m)map.

Reid Barton (Apr 14 2020 at 11:56):

Mario Carneiro said:

the haskell equivalent should have a type like nat -> (nat -> m A) -> m (array A)

This is actually kind of unreasonable in general, since m might be, like, the list monad

Reid Barton (Apr 14 2020 at 11:57):

so I think it's not possible to write a general implementation which specializes to good code for the identity or state monads

Mario Carneiro (Apr 14 2020 at 12:00):

I don't follow. It's just traverse, right?

Mario Carneiro (Apr 14 2020 at 12:01):

If you did this in the list monad you would get a whole bunch of arrays

Reid Barton (Apr 14 2020 at 12:02):

Right, and so you can't implement this by working in-place with a single array. You'd have to copy.

Mario Carneiro (Apr 14 2020 at 12:03):

I don't really know how GHC works its magic here, but I guess I was imagining it would just inline the definitions until it looked linear

Reid Barton (Apr 14 2020 at 12:04):

hmm https://hackage.haskell.org/package/vector-0.12.1.2/docs/Data-Vector.html#v:generateM

Reid Barton (Apr 14 2020 at 12:05):

but the vector package is quite complicated and mysterious

Reid Barton (Apr 14 2020 at 12:07):

Oh, it looks like it relies on rewrite rules to swap in a good implementation when m = IO (or m = ST s)

Mario Carneiro (Apr 14 2020 at 12:07):

You are probably right that this approach is doomed in lean 3 though

Mario Carneiro (Apr 14 2020 at 12:07):

we probably have to hardcode the state monad

Reid Barton (Apr 14 2020 at 12:14):

Oh, also I think the actual Haskell answer to how do you do this when m is the state monad is that you just use listArray (construct an array from a list of elements) and build the list lazily

Reid Barton (Apr 14 2020 at 12:15):

GHC may or may not avoid the allocation of the individual list cells and thunks but it's still fairly cheap if it doesn't

Reid Barton (Apr 14 2020 at 12:15):

anyways, this is also not an option in Lean (3 or otherwise)

Mario Carneiro (Apr 14 2020 at 12:16):

yeah, a lazy list would also be a pretty optimal solution here

Mario Carneiro (Apr 14 2020 at 12:16):

although typing a dependent lazy list sounds like a pain

Mario Carneiro (Apr 14 2020 at 12:18):

actually now that I come to think about it that could be used as a replacement for the case splitty function I mentioned above without any problems

Mario Carneiro (Apr 14 2020 at 12:18):

it doesn't even need to be lazy

Last updated: May 02 2025 at 03:31 UTC