Zulip Chat Archive

Stream: lean4

Topic: Unusual Subarray behavior


François G. Dorais (May 07 2023 at 00:47):

I just noticed this weird behavior:

def a : Array Nat := #[0,1,2,3,4,5,6]

def b : Subarray Nat := a[1:5]

#eval (b.as, b.start, b.stop) -- (#[0,1,2,3,4,5,6], 1, 5)

def c : Subarray Nat := b[1:3]

#eval (c.as, c.start, c.stop) -- (#[1,2,3,4], 1, 3) !?

I would have expected the last line to give (#[0,1,2,3,4,5,6], 2, 4).

The reason for this odd behavior is that the a[start:stop] notation assumes that a is an array. So when calculating c, Lean first coerces b into an array and then calculates the subarray of that. This is very inefficient but...

Why is there a coercion from Subarray to Array at all? This is a very expensive coercion that copies the subarray elements one-by-one into a subarray (Array.ofSubarray). I looked for where this coercion was used. So I modified the Lean code by removing this instance.

  • It is used once in src/Init/NotationExtra.lean (easy patch) and nowhere else in Init.
  • It is used a lot in src/Lean but I patched that by adding a very conservative scoped instance : CoeTail (Subarray α) (Array α) in Lean/Name.lean (which is imported by a majority of files in src/Lean).
  • It is used in Std4 in Array/Merge.lean but what is actually needed there is an instance of HAppend (Array α) (Subarray α) (Array α) or similar.
  • I didn't check Mathlib4 since that would take a long while.

It seems to me that this unusual coercion is not of much use outside of core's src/Lean. As for the uses there, there's a _lot_. However, they fall into a handful of patterns and it is not at all clear they are necessary. In fact, I would think that all these array copies would have a negative impact on performance. For example, one of the main culprits is Lean.mkAppN but that function doesn't need its own copy of the arguments Array, it would work just as well using a Subarray, eliminating the need to copy the arguments.

Anyway, long story short, I think that:

  1. There shouldn't be a coercion from Subarray to Array: it's expensive (which is unusual for a coercion) and not that useful.
  2. The [start:stop] notation should work correctly on Subarrays, just changing the start/stop points without creating a new copy of the array.
  3. There could be some use in a coercion from Array to Subarray, which is cheap and would allow to use arrays as streams without explicitly converting to a Subarray every time.

Andrew Carter (May 07 2023 at 04:47):

Is the expected behavior:

#eval (c.as, c.start, c.stop) -- (#[0,1,2,3,4,5,6], 2, 4)

François G. Dorais (May 07 2023 at 05:14):

Yes, fixed. Thanks!

Scott Morrison (May 07 2023 at 05:52):

@François G. Dorais, if you have a branch for Lean4 with this change, that compiles, I'm happy to benchmark mathlib4 against it.

Jannis Limperg (May 08 2023 at 13:59):

I agree with all these points. If you want to spend even more time on this, you could port the entire Array API to Subarray. (This can be done in std of course.) Subarray is often the better type for efficiency and therefore deserves just as much API.

François G. Dorais (Jul 30 2023 at 16:00):

I dropped the ball here but I finally filed an issue about this: https://github.com/leanprover/lean4/issues/2360

François G. Dorais (Jul 30 2023 at 16:03):

@Jannis Limperg I'm not sure I want to do this but it might be useful to expand your thoughts a bit...
PS: Apologies for the belated reply!

Jannis Limperg (Jul 30 2023 at 16:35):

What I meant is just that when you work with parts of an array, it is often much more efficient to consider the slices as Subarrays than as Arrays, because the latter means copying the parts around. But the current impoverished API of Subarray discourages its use, so the path of least resistance has suboptimal performance.

By the way, many functions on Arrays already have start and stop arguments that allow you to work only on a part of the array. Porting these to Subarray would be trivial.

François G. Dorais (Jul 30 2023 at 16:51):

I totally agree! But with a coercion from Subarray to Array in core this is all basically useless optimization, isn't it?

Jannis Limperg (Jul 30 2023 at 16:58):

Not completely useless; you can still be careful and void the coercion. But I agree that the coercion is a footgun and should be removed.


Last updated: Dec 20 2023 at 11:08 UTC