Zulip Chat Archive

Stream: mathlib4

Topic: Array


Kevin Buzzard (Nov 30 2022 at 16:37):

Array in Lean 4 is not array in Lean 3; the easiest way to see this is the observation that array eats a nat and Array doesn't (the nat can be extracted from the array but it's not part of the input).

OK so right now I feel morally obliged to port data.list.defs because I want to have lines 751 to 754 of it in mathlib4, but on line 66 there is list.to_array which is flagged as a dubious translation because Array and array don't match. The declaration is used in exotic files such as data.buffer.basic and also in data.array.lemmas. What are we doing with this array mismatch? Are we making Array' which is aligned with Lean 3 array?

Floris van Doorn (Nov 30 2022 at 17:08):

I'm inclined to say to not port array and just leave a porting note that you've deleted it. I don't think array was used in anything with mathematical content. It's used in hash_map, but that file is nowhere imported.

Floris van Doorn (Nov 30 2022 at 17:09):

Don't duplicate work with @Arien Malec, though (see https://leanprover.zulipchat.com/#narrow/stream/287929-mathlib4/topic/list.2Etraverse )

Jeremy Salwen (Jan 08 2023 at 21:31):

It seems to me that Array is missing a lot of basic lemmas. Am I looking in the wrong place, or should I try implementing them myself?

Examples:

a ++ #[] = a
#[] ++ a = a
a ++ toArray l = a ++ l
(a.push x) ++ l = a ++ (x::l)

Ruben Van de Velde (Jan 08 2023 at 21:33):

I think Array is new in lean4 - any lemmas in mathlib would be about List

Jeremy Salwen (Jan 08 2023 at 21:35):

There are a bunch of lemmas in Mathlib4 about Array (eg: https://leanprover-community.github.io/mathlib4_docs/Std/Data/Array/Lemmas.html) but it seems like still many are missing to me.

Sky Wilshaw (Jan 08 2023 at 21:36):

Those lemmas are in the Std namespace, not Mathlib. I think these are maintained by Lean's core team.

Jeremy Salwen (Jan 08 2023 at 21:38):

Ok, I see the distinction, Mathlib only has a single lemma https://leanprover-community.github.io/mathlib4_docs/Mathlib/Data/Array/Basic.html

Jeremy Salwen (Jan 08 2023 at 21:39):

Ok, so I should just move this discussion to the lean4 stream?

Sky Wilshaw (Jan 08 2023 at 21:40):

I think you can use things like docs4#Array.data_toArray to transport lemmas about List into lemmas about Array.

Jeremy Salwen (Jan 08 2023 at 21:44):

I think most Array operations are missing an equivalence proof to the corresponding List operation.

Sky Wilshaw (Jan 08 2023 at 21:45):

Ah, that sounds like something you'd have to do manually.

Jeremy Salwen (Jan 08 2023 at 21:45):

Yeah, is this something that would be good as a contribution to Std?

Sky Wilshaw (Jan 08 2023 at 21:46):

I'm not the person to ask unfortunately, but it sounds useful.

Yaël Dillies (Jan 08 2023 at 21:49):

Well maybe Array shouldn't have lemmas written about it, but I don't know.

Sky Wilshaw (Jan 08 2023 at 21:49):

I think the existence of docs4#Array.data_toArray is an indication that they want these lemmas though.

Jeremy Salwen (Jan 08 2023 at 21:51):

I think you need lemmas about Array to prove things about practical programs. But maybe there is an alternative I am missing?

Sky Wilshaw (Jan 08 2023 at 21:53):

That sounds reasonable to me. This chat is mostly focused on the maths that we can do from within Lean, so most people here probably aren't as interested in that part of the language.

Sky Wilshaw (Jan 08 2023 at 21:54):

Mathlib doesn't really seem like the place for Array lemmas, is my point.

Henrik Böving (Jan 08 2023 at 22:41):

Sky Wilshaw said:

Those lemmas are in the Std namespace, not Mathlib. I think these are maintained by Lean's core team.

std4 team != core team but they do have intersection

Jeremy Salwen said:

I think most Array operations are missing an equivalence proof to the corresponding List operation.

Yes this is because Array is a different data structure in memory than List. It is merely represented as a List in the Lean 4 type system but the runtime has special support for it so the proofs aren't all trivial.

Jeremy Salwen said:

Yeah, is this something that would be good as a contribution to Std?

There is an issue on std4 for this: https://github.com/leanprover/std4/issues/24

Yaël Dillies said:

Well maybe Array shouldn't have lemmas written about it, but I don't know.

Array should definitely have proofs written about it, it is in the majority of cases the more performant datastructure to use compared to lists so computer scientists doing verification of Lean programs will want to use it.

Sky Wilshaw said:

I think the existence of docs4#Array.data_toArray is an indication that they want these lemmas though.

Due to the inherent difference in the operations of Array operations and List operations this lemma is not actually as useful as it might seem (but it and other lemmas that show similarities between List and Array operations are still helpful none the less). A few examples for why we might want to write algorithms on Arrays inherently different from List ones:

  • You can easily add to the front of a List with a cons but you never want to do that with a slice of memory since it requires a full copy
  • You can easily share a List but you never want to do that with an Array since it requires a full copy upon access, as long as the array is uniquely referenced it is in fact updated in place
  • You don't actually want to use the Array.data function in practical code (unless that code is merely a facade for @[extern]) because turning a runtime Array into a List is O(n). What you usually want to do is write your algorithm in terms of indices like you would with regular arrays so you end up with completely different patterns.

That being said if you can express your algorithm in terms of primitive operations that are implemented on both Array and List (fold, map, filter ...) it does still help to have equalities between them to apply list lemmas in verification.

Arien Malec (Jan 08 2023 at 22:56):

To my mind, mathlib should have theorems about…math. Theorems about the soundness or correctness of general purpose data types or APIs probably belong upstream (core or std), again IMHO.

Scott Morrison (Jan 09 2023 at 02:40):

mathlib4 is always going to have a lower barrier to entry than std4. So even if the "right place" for a lemma (e.g. about Array) is in std4, we shouldn't hesitate at all about putting it in mathlib4 in the meantime. Once it has seen some use / been refactored a few times / Mario says it's time for it to be in std, it is easy to migrate it down in std4.

Eric Wieser (Jan 09 2023 at 19:15):

This sounds like there is probably a niche for some container typeclasses so that we don't have to write the same lemmas about array and list

Henrik Böving (Jan 09 2023 at 19:57):

That too is in the works on the std4 side.

Thomas Murrills (Jan 09 2023 at 21:51):

I found myself looking for these array lemmas occasionally too, and I think in those cases they would have been pretty useful.

There are two reasons I think they're not terribly out of place in mathlib4 per se:

  • sometimes you need to reason about a mathematical definition which is implemented behind the scenes in a performant way, and this exposes some Array stuff—this doesn't usually happen but it does sometimes happen
  • sometimes you as a mathematician do want to write performant code! Especially if you're doing, say, some kind of explicit combinatorics or computational homology or something, where collections of things can get large and need speedy operations.

So, in addition to the reasons mentioned above, I don't think the perceived "unmathiness" of these lemmas, so to speak, should prevent them from being in mathlib4 immediately. :)

Eric Wieser (Jan 09 2023 at 22:10):

I think @Yaël Dillies point is that if you want to prove things about arrays, you can just unfold the array definitions and then use the list proofs

Eric Wieser (Jan 09 2023 at 22:11):

Which needs n2+2nn^2 + 2n lemmas instead of 2n2+2n2n^2 + 2n, where n is the size of the list api

Jeremy Salwen (Jan 10 2023 at 19:16):

So I have done a few proofs about Array, and I have tried to do them the suggested way of converting to List operations.

If I am proving an equality of two Array expressions, for the most part, apply Array.exp'; simp will solve it. However, sometimes when the expressions are more complex, this will not completely simplify away the expression. Here is an example where it took two passes:

import Std.Data.List.Basic
import Mathlib.data.list.basic

open Lean

def Array.modifyHead (F: α α) (a:Array α): Array α := Array.modify a 0 F

@[simp]
lemma Array.modifyHead_data (a:Array α): (Array.modifyHead f a).data = List.modifyHead f a.data := by sorry

lemma examp (acc: Array α): Array.push #[] acc = Array.modifyHead (Array.append acc) (Array.push #[] #[]):= by
  apply Array.ext'
  simp
  -- We still have the goal acc = acc ++ #[], so we need to do it again
  apply Array.ext'
  simp

Additionally, this only works if my goal is directly an equality of Array expressions. A lot of the time, you will have Array subexpressions nested in some larger expression, and you'd like to just simp everything away, but you need to rewrite each Array subexpression one by one using this technique.

Both of these limitations make my approach feel like a bit of a hack. If Array lemmas were added to mathlib, then simp would just handle all these issue gracefully. But if we don't add all the Array lemmas, it still seems like it should be possible to write a smarter tactic that can act as if all these lemmas were present, by intelligently using the isomorphism to List.

Eric Wieser (Jan 10 2023 at 19:18):

docs4#Array.modify is unusual because it's not implemented in terms of list.modify

Eric Wieser (Jan 10 2023 at 19:21):

But actually, the difficulty here is that you're working with an Array of Arrays; so it's not surprising that you have to use ext at each level

Eric Wieser (Jan 10 2023 at 19:22):

This works:

lemma Array.ext_iff {α : Type u_1} {as bs : Array α} :  as = bs  as.data = bs.data := by sorry

lemma examp (acc: Array α): Array.push #[] acc = Array.modifyHead (Array.append acc) (Array.push #[] #[]):= by
  simp [Array.ext_iff]

Jeremy Salwen (Jan 10 2023 at 20:24):

Very cool! That fixes all my issues with having to apply Array.ext' twice. However, it still doesn't fix the issue where simp will handle List subepxressions, but simp Array.ext_iff won't handle Array Subexpressions. Here is a MWE:

import Std.Data.List.Basic
import Mathlib.data.list.basic

open Lean

lemma Array.ext_iff {α : Type u_1} {as bs : Array α} :  as = bs  as.data = bs.data := by sorry

def foo_array (a: Array (Array α)): Array (Array α) := sorry

lemma examp_array (r rest: Array (Array α)):
 foo_array (Array.push (r ++ rest) acc) = r ++ foo_array (Array.push rest acc) := by
 simp [Array.ext_iff]
 -- Array subexpressions are not normalized!

def foo_list (a: List (List α)): List (List α) := sorry

lemma examp_list (r rest: List (List α)):
 foo_list (List.concat (r ++ rest) acc) = r ++ foo_list (List.concat rest acc) := by
 simp
 -- List subexpressions are normalized!

Eric Wieser (Jan 10 2023 at 20:28):

The argument here would be that this is hard because you didn't prove anything about (foo_array a).to_list

Eric Wieser (Jan 10 2023 at 20:29):

Eric Wieser said:

Which needs n2+2nn^2 + 2n lemmas instead of 2n2+2n2n^2 + 2n, where n is the size of the list api

The 2n2n lemmas here are the ones about Array.toList and List.toArray, combined with the list operations

Jeremy Salwen (Jan 10 2023 at 20:42):

Eric Wieser said:

The argument here would be that this is hard because you didn't prove anything about (foo_array a).to_list

I agree that's why it's hard, but the question is why does it have to be hard. Simplifying expressions involving Arrays seems like a common and reasonable thing to do while proving things about Arrays. Why wouldn't we want to have some sort of tactic to do that?

Yaël Dillies (Jan 10 2023 at 20:43):

It would be possible, but probably not worth the effort.

Yaël Dillies (Jan 10 2023 at 20:43):

Look, docs#finset and docs#set are basically two of the biggest APIs in mathlib, and we still don't have a tactic to transfer results from one to the other.

Yaël Dillies (Jan 10 2023 at 20:44):

By no means am I trying to discourage someone to work on such a tactic, I'm just saying that if you don't take the matter in hands, then likely nobody will.

Yaël Dillies (Jan 10 2023 at 20:48):

Regarding set and finset, I found that coe_inj.2 $ by { norm_cast, exact set.some_lemma } (or the coe_subset version thereof) was a reliable way of proving finset.some_lemma. By now, there are many lemmas in mathlib I proved that way. So I think it is a sensible heuristic for implementing such a tactic.

Jeremy Salwen (Jan 10 2023 at 20:49):

I would give it a shot, I am just not sure which approach would be right. I can imagine a few ways it could be implemented: generating new lemmas, configuring simp to handle things better, preprocessing and passing to simp, or walking the expression tree directly.

Eric Wieser (Jan 10 2023 at 20:51):

Note that the way such a tactic would likely work is:

  1. Prove everything we want about lists (the n2n^2 lemmas I talked about before)
  2. Prove the links between the lists and array definitions (the 2n2n lemmas)
  3. Write a tactic to generate the other n2n^2 lemmas

The thing is, once you do the first two steps, you can already get what you want with simp

Yaël Dillies (Jan 10 2023 at 20:52):

But I also think there's something more general to get out of this, because such a tactic could likely replace to_additive's current fragile implementation: Instead of regenerating the proof term every time, apply transfer lemmas then the original theorem.

Eric Wieser (Jan 10 2023 at 20:53):

I suspect @Jeremy Salwen is asking this question because they want to write proofs about arrays in terms of other lemmas about arrays; but if you enable this workflow then the list API falls behind the array API instead.

Eric Wieser (Jan 10 2023 at 20:53):

If you're not careful you end up in a mess where some lemmas are transferred forwards and others backwards and now you have horrible import cycles.

Yaël Dillies (Jan 10 2023 at 20:54):

Not sure I'm following. How does this workflow favor one over the other?

Eric Wieser (Jan 10 2023 at 20:55):

If you have lemmas about Array available then you're tempted into writing more complex lemmas about arrays

Yaël Dillies (Jan 10 2023 at 20:55):

Ah yeah I see what you mean.

Eric Wieser (Jan 10 2023 at 20:55):

If you have only list lemmas available, then you always write the list lemma first

Yaël Dillies (Jan 10 2023 at 20:56):

My tactic suggestion would kind of prevent that, as we would only allow translation one way.

Eric Wieser (Jan 10 2023 at 20:56):

You'd stil need to tell everyone PRing an array proof "please write the list proof instead". But I guess this already happens with to_additive.

Jeremy Salwen (Jan 10 2023 at 20:56):

Eric Wieser said:

I suspect Jeremy Salwen is asking this question because they want to write proofs about arrays in terms of other lemmas about arrays; but if you enable this workflow then the list API falls behind the array API instead.

Mostly this is coming from writing proofs about programs. To write a proof by induction about a function operating on Arrays, you end up with the induction hypothesis as a "lemma about arrays".

Yaël Dillies (Jan 10 2023 at 20:57):

Yeah but if all lemmas in the Array files are proved with by translate, I won't need to shout that loud to make myself understood.

Jeremy Salwen (Jan 10 2023 at 21:02):

If there was a version of simp that understood these isomorphisms that would be cool. It would know for certain types that it should simplify by first translating them to the canonical type and simplifying them there, and translating back.

Notification Bot (Jan 10 2023 at 21:51):

19 messages were moved from this topic to #mathlib4 > A translate tactic by Eric Wieser.

Mario Carneiro (Jan 11 2023 at 01:27):

Yaël Dillies said:

Regarding set and finset, I found that coe_inj.2 $ by { norm_cast, exact set.some_lemma } (or the coe_subset version thereof) was a reliable way of proving finset.some_lemma. By now, there are many lemmas in mathlib I proved that way. So I think it is a sensible heuristic for implementing such a tactic.

This is basically what the norm_cast tactic does, so I would say the tactic to do this already exists

Jeremy Salwen (Apr 30 2023 at 17:21):

I have experimented a bit with trying to write a tactic that will prove things about Arrays using the list lemmas. I think the best approach I found is to add additional lemmas which can rewrite all Array operations in terms of the corresponding list operation:

import Std.Data.List.Basic
import Mathlib.data.list.basic

open Lean

def Array.modifyHead (F: α α) (a:Array α): Array α := Array.modify a 0 F

@[simp]
lemma Array.modifyHead_data (a:Array α): (Array.modifyHead f a).data = List.modifyHead f a.data := by sorry

lemma Array.append_translated {α}: @Array.append α = (λ a b => List.toArray (List.append (a.data) (b.data))) := sorry
lemma Array.push_translated {α}: @Array.push α = (λ a v => List.toArray (a.data ++ [v])) := sorry
lemma Array.modifyHead_translated {α}: @Array.modifyHead α = (λ f a => List.toArray (List.modifyHead f a.data)) := sorry

lemma examp (acc: Array α): Array.push #[] acc = Array.modifyHead (Array.append acc) (Array.push #[] #[]):= by
  simp [ Array.append_translated, Array.push_translated, Array.modifyHead_translated, Array.modifyHead_translated]
 --- Solves it!

In the cases I've tried, it seems like trying to translate the operations is much more robust than translating the values.

The interesting thing is that these lemmas are basically a slightly different form of the lemmas that show up in Std. I think that there should be a way to automatically derive these lemmas from the version that is in Std.


Last updated: Dec 20 2023 at 11:08 UTC