Zulip Chat Archive
Stream: general
Topic: d_array.mmap
Johan Commelin (Apr 14 2020 at 09:31):
This code is (unc)lean. Can someone please tell me how to make it (c)lean?
namespace d_array universe variables u v w variables {n : ℕ} {α : fin n → Type u} lemma push_back_idx {j n} (h₁ : j < n + 1) (h₂ : j ≠ n) : j < n := nat.lt_of_le_and_ne (nat.le_of_lt_succ h₁) h₂ def mmap_core {β : fin n → Type v} {m : Type v → Type w} [monad m] (a : d_array n α) (f : Π i : fin n, α i → m (β i)) : ∀ i (h : i ≤ n), m (d_array i (β ∘ (λ j, ⟨j.1, lt_of_lt_of_le j.2 h⟩))) | 0 _ := pure d_array.nil | (i+1) h := do bs ← mmap_core i (le_of_lt h), b ← f ⟨i, h⟩ (a.read ⟨i, h⟩), pure $ { data := λ ⟨j, hj⟩, (if hj' : j = i then by { cases hj', exact b } else bs.read ⟨j, push_back_idx hj hj'⟩) } /-- Monadically map a function over the array. -/ def mmap {β : fin n → Type v} {m} [monad m] (a : d_array n α) (f : Π i : fin n, α i → m (β i)) : m (d_array n β) := have (β ∘ (λ j, ⟨j.1, j.2⟩ : fin n → fin n)) = β, by { funext j, cases j, refl }, by { rw ← this, exact a.mmap_core f _ (le_refl _) } end d_array
Mario Carneiro (Apr 14 2020 at 09:33):
Hm, mmap_core
doesn't look good, it's requilding the whole array on every iteration
Kenny Lau (Apr 14 2020 at 09:33):
can't something transverse
solve this?
Kevin Buzzard (Apr 14 2020 at 09:34):
Johan why are you currently interested in stuff like this? Is there some underlying mathematical reason behind it?
Johan Commelin (Apr 14 2020 at 09:35):
No... I'm trying to get rewrite_search
into mathlib. This is part of cleaning up the code. And we want the tactic to be fast.
Mario Carneiro (Apr 14 2020 at 09:36):
This looks like a hard function to implement efficiently without type-changing write
Johan Commelin (Apr 14 2020 at 09:36):
I'm maybe not the right person for the job right now... but maybe I can learn to become a reasonably good person for the job :sweat_smile:
Mario Carneiro (Apr 14 2020 at 09:36):
maybe kick it down to C++?
Johan Commelin (Apr 14 2020 at 09:36):
Mario Carneiro said:
maybe kick it down to C++?
Maybe not me?
Mario Carneiro (Apr 14 2020 at 09:38):
Seems like the best case scenario for lean code is you construct a case splitty function by iterating nat.cases_on
, and pass it to array.mk
Johan Commelin (Apr 14 2020 at 09:38):
If you want to turn Lean into hello-world spaghetti, then you should ask me to start touching the C++
Kevin Buzzard (Apr 14 2020 at 09:40):
Johan Commelin said:
I'm trying to get
rewrite_search
into mathlib.
That's a mathematical reason in some sense.
Johan Commelin (Apr 14 2020 at 09:49):
Mario Carneiro said:
Seems like the best case scenario for lean code is you construct a case splitty function by iterating
nat.cases_on
, and pass it toarray.mk
@Mario Carneiro I'm not sure if I know how to do this in an acceptable way, because I really don't know what is fast/allowed and what is slow/bad code...
Gabriel Ebner (Apr 14 2020 at 09:54):
Mario Carneiro said:
case splitty function by iterating
nat.cases_on
, and pass it toarray.mk
That has quadratic runtime. I see two other options:
- Define
array.mmap
(which is easy because we havearray.push_back
). Then defined_array.mmap
by convertingd_array n α
toarray n (sigma α)
and back. (Or just don't bother withd_array.mmap
at all.) - Define
d_array.push_back
(and add two characters tovm_array.cpp
for a fast VM implementation). Then defned_array.mmap
usingd_array.push_back
.
Johan Commelin (Apr 14 2020 at 09:55):
If the consensus is that d_array.mmap
is not important, I'll try to focus on array.mmap
instead (-;
Gabriel Ebner (Apr 14 2020 at 09:56):
Unless you absolutely love working on d_array.mmap
I would say go for array.mmap
instead.
Mario Carneiro (Apr 14 2020 at 10:01):
It would also be great if there was a way to create an uninitialized array
Mario Carneiro (Apr 14 2020 at 10:02):
Maybe there are tricks that can be done with unchecked cast
Mario Carneiro (Apr 14 2020 at 10:04):
Alternatively, there could be another d_array constructor that uses a state transition function (like A -> B x A) rather than a nat indexed function
Mario Carneiro (Apr 14 2020 at 10:09):
constant d_array.mmk {n : ℕ} {α : fin n → Type u} {m : Type u → Type v} [monad m] : (∀ i, m (α i)) → m (d_array n α)
Mario Carneiro (Apr 14 2020 at 10:11):
which makes mmap
easy to define:
def mmap {β : fin n → Type v} {m} [monad m] (a : d_array n α) (f : Π i : fin n, α i → m (β i)) : m (d_array n β) := d_array.mmk (λ i, f i (a.read i))
Reid Barton (Apr 14 2020 at 11:34):
If you could explain what it is you are trying to achieve in non-Lean terminology, I could probably tell you how it is done in GHC. I don't know the difference between d_array
and array
though.
Johan Commelin (Apr 14 2020 at 11:35):
d_array
is dependent array
Reid Barton (Apr 14 2020 at 11:35):
Okay, well obviously those don't exist in GHC.
Mario Carneiro (Apr 14 2020 at 11:47):
I don't think it makes much difference. The main annoying thing is that you can't write a clean push_back
on d_array
Mario Carneiro (Apr 14 2020 at 11:48):
My proposal above should have some haskell equivalent: a monadic constructor for array
Mario Carneiro (Apr 14 2020 at 11:49):
the haskell equivalent should have a type like nat -> (nat -> m A) -> m (array A)
Johan Commelin (Apr 14 2020 at 11:50):
I've pushed a commit that ditches d_array.(m)map
.
Reid Barton (Apr 14 2020 at 11:56):
Mario Carneiro said:
the haskell equivalent should have a type like
nat -> (nat -> m A) -> m (array A)
This is actually kind of unreasonable in general, since m
might be, like, the list monad
Reid Barton (Apr 14 2020 at 11:57):
so I think it's not possible to write a general implementation which specializes to good code for the identity or state monads
Mario Carneiro (Apr 14 2020 at 12:00):
I don't follow. It's just traverse
, right?
Mario Carneiro (Apr 14 2020 at 12:01):
If you did this in the list monad you would get a whole bunch of arrays
Reid Barton (Apr 14 2020 at 12:02):
Right, and so you can't implement this by working in-place with a single array. You'd have to copy.
Mario Carneiro (Apr 14 2020 at 12:03):
I don't really know how GHC works its magic here, but I guess I was imagining it would just inline the definitions until it looked linear
Reid Barton (Apr 14 2020 at 12:04):
hmm https://hackage.haskell.org/package/vector-0.12.1.2/docs/Data-Vector.html#v:generateM
Reid Barton (Apr 14 2020 at 12:05):
but the vector package is quite complicated and mysterious
Reid Barton (Apr 14 2020 at 12:07):
Oh, it looks like it relies on rewrite rules to swap in a good implementation when m = IO
(or m = ST s
)
Mario Carneiro (Apr 14 2020 at 12:07):
You are probably right that this approach is doomed in lean 3 though
Mario Carneiro (Apr 14 2020 at 12:07):
we probably have to hardcode the state monad
Reid Barton (Apr 14 2020 at 12:14):
Oh, also I think the actual Haskell answer to how do you do this when m
is the state monad is that you just use listArray
(construct an array from a list of elements) and build the list lazily
Reid Barton (Apr 14 2020 at 12:15):
GHC may or may not avoid the allocation of the individual list cells and thunks but it's still fairly cheap if it doesn't
Reid Barton (Apr 14 2020 at 12:15):
anyways, this is also not an option in Lean (3 or otherwise)
Mario Carneiro (Apr 14 2020 at 12:16):
yeah, a lazy list would also be a pretty optimal solution here
Mario Carneiro (Apr 14 2020 at 12:16):
although typing a dependent lazy list sounds like a pain
Mario Carneiro (Apr 14 2020 at 12:18):
actually now that I come to think about it that could be used as a replacement for the case splitty function I mentioned above without any problems
Mario Carneiro (Apr 14 2020 at 12:18):
it doesn't even need to be lazy
Last updated: Dec 20 2023 at 11:08 UTC