Zulip Chat Archive
Stream: lean4
Topic: Unusual Subarray behavior
François G. Dorais (May 07 2023 at 00:47):
I just noticed this weird behavior:
def a : Array Nat := #[0,1,2,3,4,5,6]
def b : Subarray Nat := a[1:5]
#eval (b.as, b.start, b.stop) -- (#[0,1,2,3,4,5,6], 1, 5)
def c : Subarray Nat := b[1:3]
#eval (c.as, c.start, c.stop) -- (#[1,2,3,4], 1, 3) !?
I would have expected the last line to give (#[0,1,2,3,4,5,6], 2, 4)
.
The reason for this odd behavior is that the a[start:stop]
notation assumes that a
is an array. So when calculating c
, Lean first coerces b
into an array and then calculates the subarray of that. This is very inefficient but...
Why is there a coercion from Subarray
to Array
at all? This is a very expensive coercion that copies the subarray elements one-by-one into a subarray (Array.ofSubarray
). I looked for where this coercion was used. So I modified the Lean code by removing this instance.
- It is used once in
src/Init/NotationExtra.lean
(easy patch) and nowhere else inInit
. - It is used a lot in
src/Lean
but I patched that by adding a very conservativescoped instance : CoeTail (Subarray α) (Array α)
inLean/Name.lean
(which is imported by a majority of files insrc/Lean
). - It is used in Std4 in
Array/Merge.lean
but what is actually needed there is an instance ofHAppend (Array α) (Subarray α) (Array α)
or similar. - I didn't check Mathlib4 since that would take a long while.
It seems to me that this unusual coercion is not of much use outside of core's src/Lean
. As for the uses there, there's a _lot_. However, they fall into a handful of patterns and it is not at all clear they are necessary. In fact, I would think that all these array copies would have a negative impact on performance. For example, one of the main culprits is Lean.mkAppN
but that function doesn't need its own copy of the arguments Array, it would work just as well using a Subarray, eliminating the need to copy the arguments.
Anyway, long story short, I think that:
- There shouldn't be a coercion from Subarray to Array: it's expensive (which is unusual for a coercion) and not that useful.
- The
[start:stop]
notation should work correctly on Subarrays, just changing the start/stop points without creating a new copy of the array. - There could be some use in a coercion from Array to Subarray, which is cheap and would allow to use arrays as streams without explicitly converting to a Subarray every time.
Andrew Carter (May 07 2023 at 04:47):
Is the expected behavior:
#eval (c.as, c.start, c.stop) -- (#[0,1,2,3,4,5,6], 2, 4)
François G. Dorais (May 07 2023 at 05:14):
Yes, fixed. Thanks!
Scott Morrison (May 07 2023 at 05:52):
@François G. Dorais, if you have a branch for Lean4 with this change, that compiles, I'm happy to benchmark mathlib4 against it.
Jannis Limperg (May 08 2023 at 13:59):
I agree with all these points. If you want to spend even more time on this, you could port the entire Array
API to Subarray
. (This can be done in std
of course.) Subarray
is often the better type for efficiency and therefore deserves just as much API.
François G. Dorais (Jul 30 2023 at 16:00):
I dropped the ball here but I finally filed an issue about this: https://github.com/leanprover/lean4/issues/2360
François G. Dorais (Jul 30 2023 at 16:03):
@Jannis Limperg I'm not sure I want to do this but it might be useful to expand your thoughts a bit...
PS: Apologies for the belated reply!
Jannis Limperg (Jul 30 2023 at 16:35):
What I meant is just that when you work with parts of an array, it is often much more efficient to consider the slices as Subarray
s than as Array
s, because the latter means copying the parts around. But the current impoverished API of Subarray
discourages its use, so the path of least resistance has suboptimal performance.
By the way, many functions on Array
s already have start
and stop
arguments that allow you to work only on a part of the array. Porting these to Subarray
would be trivial.
François G. Dorais (Jul 30 2023 at 16:51):
I totally agree! But with a coercion from Subarray to Array in core this is all basically useless optimization, isn't it?
Jannis Limperg (Jul 30 2023 at 16:58):
Not completely useless; you can still be careful and void the coercion. But I agree that the coercion is a footgun and should be removed.
Last updated: Dec 20 2023 at 11:08 UTC