Zulip Chat Archive

Stream: batteries

Topic: RFC: Arrays with fixed length

Mario Carneiro (Mar 28 2024 at 21:12):

The fact that we don't have a type {arr : Array A // arr.size = n} is a consistent source of pain when doing monadic verification of code containing arrays. The set function can potentially change arr.size, so any time you are using set you need to carry around something saying actually the size is still what it was before so that you can still index into it with Fin n indices. This code is often not just in the verification theorems but also in the code itself, either because it wants to use the non-panicking array indexing function Array.get or because some property is needed about the contents of the array (necessarily stated only over the elements actually in the array) in order to prove termination. Examples of this include UnionFind and HashMap from Std, as well as some larger software verification projects I've been working on, or most recently this example of doing proofs about do notation, which was supposed to be exemplary but ended up being something kind of embarrassing to show.

Defining the type and operations should be relatively straightforward: I mean it to be exactly that subtype (or possibly a structure), and all operations are just filtered through. Code performance should be the same as Array, except that some functions pass n unnecessarily. Inlining should hopefully mitigate this issue.

Another possible improvement is to use a representation that avoids duplicate information by not storing the length. However:

This would require changes to the runtime and compiler, and my sense is that this is a way to ensure it won't happen for 1+ year
Although the title says "Arrays with fixed length", the length here isn't really "fixed", it's only counted externally; one can still have a push method and we do not want to lose the dynamic array doubling behavior, so keeping the capacity and length separate is desirable.
The runtime does not have type information, so we need to store both capacity and length in the runtime value regardless, so that the right objects are destroyed when an array object is freed.

Regarding the name, I don't have any good ideas. CArray A n was used in the previous discussion on dependent arrays (which is a separate thing). In any case I'd like to get this in std soonish. Thoughts? @Joe Hendrix @Scott Morrison

Joachim Breitner (Mar 28 2024 at 21:34):

Yes, please.

Although I found that where I wanted such arrays, I quickly also wanted a variant where the element type can depend on their position, when nesting them. But I guess that digresses the thread…

Mario Carneiro (Mar 28 2024 at 21:35):

yes, that would be the "dependent array" case

Mario Carneiro (Mar 28 2024 at 21:36):

that one has some additional runtime implications though (you can't get inline layout if all the types in the array are different)

Mario Carneiro (Mar 28 2024 at 21:37):

my current way of handling such cases with regular arrays (when the "underlying type" is actually the same, i.e. no storing a Nat next to a String) is to have a hypothesis on the side saying the property that holds of every element of the array

Joe Hendrix (Mar 28 2024 at 21:38):

I think this is common enough that it could go in Lean itself. I discussed the homogenous case with Scott yesterday since it was useful for a formalization of AES. Here's a link to that implementation if folks are interested and AES using it is here.

Mario Carneiro (Mar 28 2024 at 21:39):

Is it going to be used in lean?

Joe Hendrix (Mar 28 2024 at 21:44):

Potentially, but that's just at the discussion stage.

I've had some discussions (including with you) where we upstream more data structures to reduce duplication and improve consistency (e.g., unify Std.HashMap and Lean.HashMap).

Another factor is that Vector and BitVec should use consistent names and lemmas and it may be easier if they are in the same package.

Mario Carneiro (Mar 28 2024 at 21:45):

I think that if we upstream it at all we should do so later after the API is settled

François G. Dorais (Mar 28 2024 at 21:46):

This is definitely needed. However, Subarray already has most of the required features and some extra (and some bugs). So why not just overhaul Subarray?

I've often thought that maybe Subarray should be the main array type and Array should be a low-level array-storage type.

Mario Carneiro (Mar 28 2024 at 21:50):

Another factor is that Vector and BitVec should use consistent names and lemmas and it may be easier if they are in the same package.

This is completely unrelated, we can name them whatever in whatever repo they end up in. Besides this, I would say that BitVec is the outlier here, all of the array like types are called Array except for BitVec

Mario Carneiro (Mar 28 2024 at 21:52):

I'm not against the name Vector, but it may be problematic for mathlib which has actual vector spaces and may not like us taking the name for something else

Kyle Miller (Mar 28 2024 at 21:58):

@François G. Dorais docs#Subarray has the complexity that it contains elements outside the range, so it doesn't have the right extensionality lemma. I think it's necessary that Subarray work like this, because it's supposed to be an inexpensive view into an Array.

There's a way around this by using a Quot, but that adds a lot of complexity.

There are also the issue that Subarray doesn't expose the length of the subarray as a parameter, which I don't think you'd want to have exposed that way.

Eric Wieser (Mar 28 2024 at 21:58):

Mathlib has made do with docs#Vector referring to Lists of fixed length forever, so I don't think that's a concern

Mario Carneiro (Mar 28 2024 at 21:58):

you also can't push to a Subarray

Mario Carneiro (Mar 28 2024 at 21:59):

let's move the discussion about movement from Std to Init to another topic

Joe Hendrix (Mar 28 2024 at 22:03):

I noticed was that once one has arrays with external lengths, one often needs matrices quickly afterwards. Matrices seem like an addition that should be deferred though.

Joachim Breitner (Mar 28 2024 at 22:41):

Mario Carneiro said:

that one has some additional runtime implications though (you can't get inline layout if all the types in the array are different)

With inline layout, do you mean a packed non-pointer representation? Do we have that in lean arrays now?

Henrik Böving (Mar 28 2024 at 23:08):

No, only ByteArray and FloatArray have this power right now afaik

François G. Dorais (Mar 28 2024 at 23:24):

Kyle Miller said:

François G. Dorais docs#Subarray has the complexity that it contains elements outside the range, so it doesn't have the right extensionality lemma. I think it's necessary that Subarray work like this, because it's supposed to be an inexpensive view into an Array.

Indeed, I didn't flesh out my thoughts here. You're right that Subarray is not a perfect alternative to fixed-length arrays. They do need an overhaul because they are currently not "an inexpensive view into an Array" because of bugs like lean4#2360.

My second sentence is the one I should have fleshed out more. The idea is to move the current Array to a more bare-bones type, probably unsafe, not for direct public consumption, that just handles memory storage operations. And then have the actual public Array be implemented more like Subarray, with more transparent operations where it is visible when a copy can or cannot happen and where operations like set transparently don't modify size.

That said, this is along the same lines as Mario's "Another possible improvement" from above, so not for immediate consideration.

There are also the issue that Subarray doesn't expose the length of the subarray as a parameter, which I don't think you'd want to have exposed that way.

I believe this is what computed fields are for.

François G. Dorais (Mar 28 2024 at 23:28):

Henrik Böving said:

No, only ByteArray and FloatArray have this power right now afaik

But the infrastructure for more scalar arrays is already there. I once played with implementing different scalar arrays (e.g. native UInt* arrays). It is possible but tricky to do well from outside core.

Mario Carneiro (Mar 29 2024 at 01:42):

/poll What should the name be for arrays of fixed size?
CArray
Array'
Array (and change the core type to something else)
DArray
Vec
Vector

François G. Dorais (Mar 29 2024 at 02:05):

What's the D in DArray?

Mario Carneiro (Mar 29 2024 at 02:06):

dependent, as in "dependent type over length"

Mario Carneiro (Mar 29 2024 at 02:06):

It's a dependent type the same way Fin n is

Mario Carneiro (Mar 29 2024 at 02:07):

but it has an unfortunate collision with the other way that arrays could be dependent, i.e. storing different types in each slot of the array

Mario Carneiro (Mar 29 2024 at 02:08):

Note that the lean 3 name of this type is array; the type now called Array was called buffer and they were defined in the opposite order, with buffer α := Σ n, array α n

François G. Dorais (Mar 29 2024 at 02:26):

Interesting historical note. Thank you! I think I would prefer it the old way but that would mean a major runtime overhaul, so not anytime soon.

Joachim Breitner (Mar 29 2024 at 10:40):

I’d reserve DArray for the index-dependent version

DArray (size : Nat)  (i : Fin size → Type _) : Type _

(Or DVector if the above becomes Vector)

Alex Keizer (Mar 29 2024 at 14:27):

At the risk of even more bike-shedding: Do we want to think about what the name should be for Lists of fixed length as well? These are currently called Vector in mathlib, which conflicts with the most popular option of the poll.

Eric Wieser (Mar 29 2024 at 22:00):

If we want to copy C++, then Vector (or Vec to match rust) should be the variable-length array and Array should be the fixed length one

Mario Carneiro (Mar 29 2024 at 22:01):

well, even the fixed length one isn't really an Array in the C++ sense

Mario Carneiro (Mar 29 2024 at 22:01):

because it's not actually fixed length, it has all the same dynamic array behaviors as the other type

Mario Carneiro (Mar 29 2024 at 22:02):

there is still potentially some space for an actual fixed size Array type like the haskell type

Alex Keizer (Apr 01 2024 at 11:38):

That distinction is mostly about whether the data lives on the stack vs the heap, which is not a thing at all for us. As Mario points out, mathlibs Vector is not fixed-length, it is statically-known length. I'd say in this case we shouldn't worry too much about mirroring C++'s Array vs Vector distinction

Eric Wieser (Apr 01 2024 at 11:48):

I don't agree that C++'s array vs vector is stack vs heap; unique_ptr<array<T,n>> lives on the heap, and std::vector<T,customalloc> can (probably) live on the stack.

Alex Keizer (Apr 01 2024 at 13:12):

Ah, I guess my understanding of C++ was flawed

Mario Carneiro (Apr 01 2024 at 23:05):

I propose we go forward with the name Vector, unless others object, cc: @Joe Hendrix @Scott Morrison . I'll try to open a PR based on Joe's code today or tomorrow and add some lemmas.

As for the mathlib type of the same name, I think it won't be too bad to rename it since mathlib also has "vector2" and "vector3" (I think vector2 got deleted at some point though). We can open a topic on #mathlib4 to discuss it

Eric Wieser (Apr 01 2024 at 23:12):

What's the origin of the "vector" terminology for fixed length? Rust, C++, Go, and Haskell all seem to use Vector for dynamic-length objects.

Eric Wieser (Apr 01 2024 at 23:15):

vector2 never existed as a type; data/vector2.lean was a file about Vector that couldn't call itself data.vector because this clashed with a file of the same name in core.

Mario Carneiro (Apr 01 2024 at 23:16):

this is a dynamic length object though, that's what my point was above

Mario Carneiro (Apr 01 2024 at 23:17):

I'm not even sure I would call it a type with "statically-known length", because in dependent type theory everything is a bit wishy washy regarding the static/dynamic dichotomy (one of the reasons compiling DTT to something other than a uniform object model is hard)

Eric Wieser (Apr 01 2024 at 23:17):

I guess my question is really then "is there any precedent for using the combination of the names Array and Vector in this way"?

Mario Carneiro (Apr 01 2024 at 23:25):

Another point in favor of using Vector for the fixed length array type instead of the dynamic length one is that arrays of length n actually form a vector space (assuming the elements of the array are a ring or module), unlike dynamic length arrays (you have to do something funny with truncating or extending to fit if you want to add arrays of different sizes, and this process means that the space of all Array A is not a vector space without either subtyping it or quotienting it first).

Mario Carneiro (Apr 01 2024 at 23:28):

In fact, I believe the obvious vector space structure on Vector A n is exactly how the word "vector" first came to be associated with these long lists of numbers in computing (I'm thinking of examples like vector registers). Things like std::vector<T> are then so named because they "generalize" the concept of these fixed length lists, in the process losing the "vector space" part

Eric Wieser (Apr 01 2024 at 23:30):

Apparently in C++

It's called a vector because Alex Stepanov, the designer of the Standard Template Library, was looking for a name to distinguish it from built-in arrays. He admits now that he made a mistake, because mathematics already uses the term 'vector' for a fixed-length sequence of numbers. C++11 compounds this mistake by introducing a class 'array' that behaves similarly to a mathematical vector.

Eric Wieser (Apr 01 2024 at 23:33):

Should the titular object be called Array.Vector to leave room for List.Vector? Perhaps the former could be exported to allow Vector to refer to it?

Mario Carneiro (Apr 01 2024 at 23:33):

yeah I was hoping we would get a name that generalizes more clearly to the list case

Mario Carneiro (Apr 01 2024 at 23:34):

I was thinking of proposing something like LVector for the mathlib type

Mario Carneiro (Apr 01 2024 at 23:35):

We don't have much precedent for putting types in the namespace of other types when they are both top level data structures like this

Eric Wieser (Apr 01 2024 at 23:38):

String.Pos comes to mind, but "top level" is somewhat subjective

Mario Carneiro (Apr 01 2024 at 23:39):

There are examples which are like preconditions or properties on their parent (e.g. HashMap.WF), but in such cases there is usually a clear hierarchy between the notions

Mario Carneiro (Apr 01 2024 at 23:41):

same thing with String.Pos, which although it does not itself have a type dependent over String, is only usable via methods in the String namespace and really only carries semantics with respect to a string (not even the type String but a particular string. Unfortunately the type doesn't really help with getting this association right).

Mario Carneiro (Apr 01 2024 at 23:42):

with Vector the idea is really that you can use the API without ever having to have or refer to an Array

Mario Carneiro (Apr 01 2024 at 23:49):

But more than anything else, I'm most interested in getting all the comments about the name out now so that we aren't blocked on it for too long. Having the type itself in some form is more important than getting the name right, and while I would prefer not to think about what a pain renaming it would be there are already name changes of similar impact in the pipeline, waiting for that time when core gets a naming czar or otherwise finds a way around the current stalemate.

Alex Keizer (Apr 02 2024 at 10:04):

Eric Wieser said:

What's the origin of the "vector" terminology for fixed length? Rust, C++, Go, and Haskell all seem to use Vector for dynamic-length objects.

I've seen Idris also use "Vector" for this type. Their docs say:

A standard example of a dependent data type is the type of “lists with length”, conventionally called vectors in the dependent type literature.

Alex Keizer (Apr 02 2024 at 10:04):

So there is precedent for the use of Vector, at least. Not sure about the Array part

Mario Carneiro (Apr 02 2024 at 10:06):

I would have preferred we use the name Array for non-dynamic arrays (we don't have these in lean right now, but the API would be something like Haskell's Array type)

Tomas Skrivan (Apr 02 2024 at 14:15):

In my library, I use ArrayN, ListN and DataArrayN which are all variants with the size specified in the type. I personally prefer suffix from prefix as it helps with discoveribility together with autocomplete. (DataArray is an array that stores only types with fixed byte size and uses one contiguous block of memory)

Tomas Skrivan (Apr 02 2024 at 14:19):

Also I do not index them by Fin n but by a generic type I which has an instance of IndexType I(class from LeanColls)

Mario Carneiro (Apr 02 2024 at 14:32):

you can add your suggestion to the list, but I guess it's probably past the initial rush at this point

Mario Carneiro (Apr 02 2024 at 14:34):

also keep in mind that if we did use those names then you'd have to change the name of your structures, since this one definitely won't be using IndexType

Joachim Breitner (Apr 02 2024 at 19:53):

Using Vector for this in lean, will that set precedence for “We are happy to use names in their common non-mathematical, possibly CS inspired sense, even if it overlaps with a common mathematics terms”? I might expect some beginners approach lean and mathlib from the math side might be a bit surprised that there is a Vector that has very little to do with the mathematical notion with that name. (Probably not a big deal, but I’d like to highlight it here.)

Mario Carneiro (Apr 02 2024 at 19:55):

I think the discussion just above indicates the opposite? C++ is the one that "got it wrong", as the quote from Stepanov above indicates, and although we clearly arrived at the opposite naming by some different path-dependence, that seems better on the whole for us

Mario Carneiro (Apr 02 2024 at 19:57):

Granted, Vector is still only loosely related to vector spaces and still has a significant CS bent, but it seems reasonable for a language straddling the verified software / theorem proving boundary like Lean

Mario Carneiro (Apr 02 2024 at 19:57):

I don't expect us to actually have a VectorSpace R (Vector R n) instance, although I suppose it's conceivable

Mario Carneiro (Apr 02 2024 at 20:01):

But if we view this as a diff from current lean/std/mathlib, it's actually not that big: Vector A n from mathlib got upstreamed along with an implementation-detail data structure swap of List to Array for better performance

Mario Carneiro (Apr 02 2024 at 20:01):

and another type Vector2 A n (name TBD) was added to mathlib with the old implementation for backward compatibility

James Gallicchio (Apr 03 2024 at 00:46):

Alex Keizer said:

Do we want to think about what the name should be for Lists of fixed length as well?

i personally quite like having a single FixSize transformer that works for any collection that has a size, but this is all part of the LeanColls abstraction brainrot...

Joe Hendrix (Apr 03 2024 at 04:57):

I am a little surprised how much discussion there is on this. Vector is used in Coq. I used Vec in the code linked above since AES (and other crypto) tends to need that type a lot and a concise name was useful. The AES specification came from Cryptol which uses [n]tp to denote a vector of length n with elements of type tp, but that notation doesn't work in Lean.

My personal preference is Vector or Vec since that terminology is fairly widely and teven if there is no explicit vector space, they are closely related to vectors in a mathematical sense.

Kim Morrison (Apr 03 2024 at 05:59):

I'm happy with Vector.

Kim Morrison (Apr 03 2024 at 05:59):

The usage of Vector in Mathlib is fairly incidental and if there are any collisions Mathlib can move.

Mario Carneiro (Apr 03 2024 at 07:54):

@Joe Hendrix The Vector used in Coq is not the same. It is actually a type we use frequently as an example of an inductive family, but we don't actually define it in mathlib (IIRC), this is what one of the vector2 vector3 types was about. It has some aspects that make it less desirable than mathlib's Vector type, but it has approximately the same layout: a linked list, but with length stored in each node in addition. This is isomorphic to the fixed-length array type being proposed here but quite different in its implementation. The Coq type is:

inductive Vector (α : Type u) : Nat → Type u
  | nil : Vector α 0
  | cons : α → Vector α n → Vector α (n + 1)

Mario Carneiro (Apr 03 2024 at 08:00):

The type [n]tp in Cryptol, IIUC, is most similar to C++ std::array<T, N> or rust [T; N], which is to say a non-dynamic array with a statically known size. quoting from "Programming Cryptol":

This is probably obvious, but there is no way to get a value variable to appear in a type context. Types must be
known at “compile time,” and (non-literal) values are not, so there is no way to use them in that way.

In other words, although you can use "constexpr" style evaluation inside the size of an array, it's not truly a dependent type, and it is fixed at compile time.

Mario Carneiro (Apr 03 2024 at 08:03):

Now I'm not too fussed about this difference in semantics between Lean and other languages with compile-time const generics that look vaguely DTT-like at first glance, because Lean does not really have "constexpr" (although maybe it might get it someday when the compiler gets good :tm: ), but I would say it is at least a concern worth registering that we are potentially blocking a future type that will want this "Vector" name which is closer to the vector type in the aforementioned languages.

Shreyas Srinivas (May 11 2024 at 17:21):

What's the latest news on this? Is there a decision to include arrays with static length?

Mario Carneiro (May 11 2024 at 17:22):

awaiting implementation

Shreyas Srinivas (May 11 2024 at 17:23):

In what ways should this be different from the Vector a n type in Mathlib?

Mario Carneiro (May 11 2024 at 17:24):

isomorphic, but based on arrays instead of lists

Shreyas Srinivas (May 11 2024 at 17:25):

Okay. I can get cracking on this. It will be useful for fair division problems. I need a faster vector type to manipulate allocations

Shreyas Srinivas (May 11 2024 at 17:35):

And also for some circuit reasoning

Shreyas Srinivas (May 11 2024 at 17:47):

@Mario Carneiro : Should I pick the name Vector?

Shreyas Srinivas (May 11 2024 at 23:12):

I have started a PR at batteries#793 @Mario Carneiro

Shreyas Srinivas (May 11 2024 at 23:13):

It is by no means complete, but it attempts to be a "port" of Mathlib.Data.Vector as much as possible. Most (all?) remaining errors are due to Mathlib dependencies

Shreyas Srinivas (May 11 2024 at 23:14):

(deleted)

Shreyas Srinivas (May 11 2024 at 23:34):

The only remaining errors are in Section Accum. I am guessing it might make sense to add toArray versions of the toList theorems.

Shreyas Srinivas (May 11 2024 at 23:36):

I would also like to move everything under ### Basic Theorems to the corresponding Lemmas.lean file. Additionally there is the question of whether the list based definitions of functions should be changed to use Array functions where possible.

Shreyas Srinivas (May 11 2024 at 23:36):

I want to ask first whether all this is fine, before making any further changes.

Kim Morrison (May 12 2024 at 06:13):

Make sure to looks at Joe's existing code https://leanprover.zulipchat.com/#narrow/stream/348111-batteries/topic/RFC.3A.20Arrays.20with.20fixed.20length/near/430166031 above.

Kim Morrison (May 12 2024 at 06:14):

The name decision seems to be have been (in favour of Vector) at https://leanprover.zulipchat.com/#narrow/stream/348111-batteries/topic/RFC.3A.20Arrays.20with.20fixed.20length/near/430188516.

Eric Wieser (May 12 2024 at 08:09):

Does it make sense to upstream Vector as-is first, so that it can be replaced in-place (producing a clearer diff)?

Eric Wieser (May 12 2024 at 08:10):

It's annoying when upstreaming causes my code to break, and there is no diff to look at that suggests why (beyond the file being completely deleted)

Mario Carneiro (May 12 2024 at 08:19):

I wasn't really thinking of this being an upstreaming of mathlib's Vector type. I think the natural operations are a bit different when using lists vs arrays

Shreyas Srinivas (May 12 2024 at 08:20):

I am trying to salvage as much as possible since you mentioned that you wanted them isomorphic

Mario Carneiro (May 12 2024 at 08:20):

In this case it most certainly is not such an upstreaming because the underlying type and defeqs are changing. The mathlib Vector type will stay as it is, except for a rename

Yaël Dillies (May 12 2024 at 08:21):

Is anybody actually using the mathlib Vector?

Mario Carneiro (May 12 2024 at 08:21):

not really

Mario Carneiro (May 12 2024 at 08:22):

sources of inspiration for the API include mathlib's Vector, the Array API, and Joe's Vec

Shreyas Srinivas (May 12 2024 at 08:22):

My next goal is to change existing operations from list to array operations where there are complexity speedups for doing so or where proofs can be repaired to work easily

Shreyas Srinivas (May 12 2024 at 08:23):

I don't know if it makes sense to array-fy List.replicate for example.

Mario Carneiro (May 12 2024 at 08:23):

I'm confused, what is the definition of Vector in your PR?

Shreyas Srinivas (May 12 2024 at 08:23):

Array whose size is equal to n

Mario Carneiro (May 12 2024 at 08:24):

So in that case all the ops should be array ops, no?

Shreyas Srinivas (May 12 2024 at 08:24):

I initially made everything work with the underlying list to check what works. Then I replaced a few operations with array. Doing the same for the remaining ones next

Mario Carneiro (May 12 2024 at 08:25):

The actual implementation should not be using Array.mk and Array.toList in most cases

Mario Carneiro (May 12 2024 at 08:25):

these are expensive operations

Shreyas Srinivas (May 12 2024 at 08:26):

Yeah, hence replacing them one by one

Shreyas Srinivas (May 12 2024 at 08:26):

And then going around fixing all the proofs that break

Mario Carneiro (May 12 2024 at 08:26):

I think it would be easier to just take array operations and tuple them with theorems about what they do to the size

Shreyas Srinivas (May 12 2024 at 08:28):

Before I go further, there is a part of the file adapted from Mathlib's vector that won't work because of mathlib dependencies. Should I comment those out for now, or should their upstreaming also happen in this PR?

Mario Carneiro (May 12 2024 at 08:28):

What part?

Shreyas Srinivas (May 12 2024 at 08:28):

The part under section Accum

Shreyas Srinivas (May 12 2024 at 08:29):

There are a couple of lemmas from linear orders

Shreyas Srinivas (May 12 2024 at 08:29):

And the definition of mapAccumr

Mario Carneiro (May 12 2024 at 08:30):

Why does mapAccumr need theorems?

Shreyas Srinivas (May 12 2024 at 08:30):

Not sure it does.

Mario Carneiro (May 12 2024 at 08:31):

All the theorems should be removed from Batteries.Data.Vector.Basic and moved to Batteries.Data.Vector.Lemmas

Shreyas Srinivas (May 12 2024 at 08:31):

Okay. Got it. That was another of my questions

Shreyas Srinivas (May 12 2024 at 08:32):

@Yaël Dillies : I tried using it, but it is API-wise not as good as Array

Mario Carneiro (May 12 2024 at 08:33):

make sure you don't copy the bad API decisions too...

Shreyas Srinivas (May 12 2024 at 08:34):

Okay, maybe I read too much into the word isomorphic up there

Shreyas Srinivas (May 12 2024 at 08:34):

Should there also be induction theorems based on indices

Shreyas Srinivas (May 12 2024 at 08:34):

And translations to some for of multiset of items in the array?

Mario Carneiro (May 12 2024 at 08:34):

You mean like an ext lemma? Yes

Shreyas Srinivas (May 12 2024 at 08:34):

(can we do that at the moment?)

Shreyas Srinivas (May 12 2024 at 08:35):

Sort of yes. I would also like an induction principle on array length

Mario Carneiro (May 12 2024 at 08:35):

Batteries doesn't have multisets and the vector structure isn't obviously helpful

Shreyas Srinivas (May 12 2024 at 08:35):

It's useful for algorithms to at least have sets

Mario Carneiro (May 12 2024 at 08:36):

Ok but that's a separate discussion

Shreyas Srinivas (May 12 2024 at 08:37):

There are match pattern functions like head and cons. Does it make sense to create and use array versions of them?

Mario Carneiro (May 12 2024 at 08:37):

not really

Mario Carneiro (May 12 2024 at 08:38):

head is fine

Shreyas Srinivas (May 12 2024 at 12:15):

There are a couple of API defs in Prelude that seem to be primarily for use inside metaprogramming. Namely :

Array.appendCore {α : Type u} (as : Array α) (bs : Array α) : Array α
Array.extract (as : Array α) (start stop : Nat) : Array α

Do we want these definitions for Vector?

Shreyas Srinivas (May 12 2024 at 12:33):

Isn't it better to have a slice structure on top of Array and Vector, and iterators on top of them?

Shreyas Srinivas (May 12 2024 at 12:34):

Another question: Where is the array notation defined?

Mario Carneiro (May 12 2024 at 12:34):

extract yes, appendCore probably not

Shreyas Srinivas (May 12 2024 at 12:47):

Added extract. Next I want to add array notation for vectors. Where is this notation defined?

Mario Carneiro (May 12 2024 at 12:56):

https://github.com/leanprover/lean4/blob/7db8e6482e1dc46fd1070b3bf549112f02a4c05e/src/Init/Data/Array/Basic.lean#L548-L551

Shreyas Srinivas (May 12 2024 at 13:12):

I made a quick adaptation of that syntax as below:

import Batteries

namespace Vector
def Vector (α : Type u) (n : Nat) := {a : Array α // a.size = n}

/--
  `fromArray (a : Array α)` produces a `Vector of the appropriate size
-/
def fromArray (a : Array α) : Vector α (a.size) := ⟨a, by simp⟩

syntax "#v[" withoutPosition(sepBy(term, ", ")) "]" : term

macro_rules
  | `(#v[ $elems,* ]) => `(Vector.fromArray #[ $elems,* ])

#check #v[1,2,3,3,4,4]

end Vector

Shreyas Srinivas (May 12 2024 at 13:13):

That's an mwe. at the check one gets a line like this fromArray #[1, 2, 3, 3, 4, 4] : Vector Nat #[1, 2, 3, 3, 4, 4].size

Shreyas Srinivas (May 12 2024 at 13:13):

Is there a way to force that size in the type signature to evaluate?

Mario Carneiro (May 12 2024 at 13:15):

you could expand to Vector.mk (n := $(elems.size)) #[$elems,*] rfl

Shreyas Srinivas (May 12 2024 at 13:23):

Error :

invalid field 'size', the environment does not contain 'Lean.Syntax.TSepArray.size'
elems
has type
Lean.Syntax.TSepArray `term ","Lean 4

Mario Carneiro (May 12 2024 at 13:25):

.getElems.size

Shreyas Srinivas (May 12 2024 at 13:31):

Doesn't work. size isn't found. Isn't is simply Array.size from Init?

import Batteries.Data.Array
import Batteries.Data.List.Basic
import Batteries.Data.List.Lemmas

namespace Vector

open Array Lean Meta

/--
  `Vector α n` is an `Array α` whose size is statically fixed to `n`
-/
def Vector (α : Type u) (n : Nat) := {a : Array α // a.size = n}

/--
  `fromArray (a : Array α)` produces a `Vector α n` of the appropriate size
-/
def fromArray (n : Nat) (a : Array α)  (h : n = a.size): Vector α n := ⟨a, by {simp[h]}⟩

/--Syntax for `Vector α n`-/
syntax "#v[" withoutPosition(sepBy(term, ", ")) "]" : term

macro_rules
  | `(#v[ $elems,* ]) => `(Vector.fromArray (n := elems.getElems.size) #[$elems,*] rfl)

#check #v[1,2,3,3,4,4] -- unknown identifier 'elems.getElems.size✝'

end Vector

Mario Carneiro (May 12 2024 at 13:32):

you are missing the antiquotation

Shreyas Srinivas (May 12 2024 at 13:36):

Now it works but the size function remains unreduced

Mario Carneiro (May 12 2024 at 13:36):

Shreyas Srinivas (May 12 2024 at 13:36):

import Batteries.Data.Array
import Batteries.Data.List.Basic
import Batteries.Data.List.Lemmas

namespace Vector

open Array Lean Meta

/--
  `Vector α n` is an `Array α` whose size is statically fixed to `n`
-/
def Vector (α : Type u) (n : Nat) := {a : Array α // a.size = n}

/--
  `fromArray (a : Array α)` produces a `Vector α n` of the appropriate size
-/
def fromArray (n : Nat) (a : Array α)  (h : n = a.size): Vector α n := ⟨a, by {simp[h]}⟩

/--Syntax for `Vector α n`-/
syntax "#v[" withoutPosition(sepBy(term, ", ")) "]" : term

set_option pp.proofs true
macro_rules
  | `(#v[ $elems,* ]) => `(Vector.fromArray (n := #[$elems,*].size) #[$elems,*] rfl)

#check #v[1,2,3,3,4,4] -- can't reduce size

end Vector

Shreyas Srinivas (May 12 2024 at 13:37):

Should I make it an explicit parameter?

Mario Carneiro (May 12 2024 at 13:37):

/--
  `Vector α n` is an `Array α` whose size is statically fixed to `n`
-/
structure Vector (α : Type u) (n : Nat) where
  toArray : Array α
  size_eq : toArray.size = n

/--Syntax for `Vector α n` -/
syntax "#v[" withoutPosition(sepBy(term, ", ")) "]" : term

macro_rules
  | `(#v[ $elems,* ]) => `(Vector.mk (n := $(quote elems.getElems.size)) #[$elems,*] rfl)

#check #v[1,2,3,3,4,4]

Shreyas Srinivas (May 12 2024 at 13:39):

Thanks. This quotation syntax remains a bit of wizardry for me

Shreyas Srinivas (May 12 2024 at 14:25):

I am now adding the defs in Init.Data.Array.Basic. This is a very large file, so I expect this task to take a few days.

Frédéric Dupuis (May 12 2024 at 16:28):

(deleted)

Kim Morrison (May 12 2024 at 22:16):

I think it works be good to PR the deprecated tag to Mathlib's Vector
sooner rather than later.

Kim Morrison (May 12 2024 at 22:17):

I assumed this is going to Batteries, at least at first?

Shreyas Srinivas (May 12 2024 at 22:17):

This PR is going to batteries. It is far from done. There is a lot of stuff from Arrays, theorems relating to both Arrays and toList, search, sort etc, that is to be added.

Shreyas Srinivas (May 12 2024 at 22:18):

To be on the safer side I'd say this PR won't be done until next weekend, although I might bring it to feature parity with Array and Mathlib.Vector sooner than that

Shreyas Srinivas (May 12 2024 at 22:20):

Once all the pieces are in place. There is some serious reorganisation needed. The defs have to be organised better. Array is spread out all over Init and batteries, and many docstrings are missing. For Vector I want to ensure that at least all defs have a docstring.

Shreyas Srinivas (May 12 2024 at 22:22):

Also, at least two sections from The Mathlib Vector file depend on defs in Mathlib.

Shreyas Srinivas (May 12 2024 at 22:23):

They can't be used as is for the Vector I am designing anyway, but the proofs for the Vector will probably have similar dependencies.

Eric Wieser (May 12 2024 at 22:59):

Is the name going to be Batteries.Vector or _root_.Vector?

Shreyas Srinivas (May 12 2024 at 22:59):

I am putting this in Batteries. It is upto maintainers to decide where it gets moved

Mario Carneiro (May 12 2024 at 23:03):

Good question. I was initially leaning toward _root_.Vector, but then Batteries got booted from leanprover/ so I'm not so sure anymore. Most of the other data structures are in the Batteries namespace

Mario Carneiro (May 12 2024 at 23:05):

It also solves the name clash issue, although not in a very satisfactory way

Mario Carneiro (May 12 2024 at 23:07):

On the other hand this file will almost certainly be consumed by core once it matures, so putting it in _root_ will mean one less rename later

Shreyas Srinivas (May 12 2024 at 23:14):

Array has functions like popWhile and takeWhile for which no reasonable counterpart exists for Vector a n, because one wouldn't know the size of the output in advance

Mario Carneiro (May 12 2024 at 23:14):

then skip them

Kim Morrison (May 12 2024 at 23:14):

Let me consult re: namespacing

Kim Morrison (May 13 2024 at 06:03):

There is general enthusiasm in the FRO for working on this as Batteries.Vector rather than _root_.Vector.

François G. Dorais (May 13 2024 at 07:55):

Is there a reason to use a subtype? Personally, I would use

structure Vector (α : Type _) (n : Nat) extends Array α where
  ofArray ::
  size_eq : toArray.size = n

to avoid indirection.

Note that this allows to transport many Array ops cleanly and efficiently:

@[inline] def Vector.push : Vector α n → α → Vector α (n+1)
  | ⟨a, h⟩, x => ⟨a.push x, h ▸ Array.size_push ..⟩

Quick style note: Batteries, like core, loves auto implicit, so {α : Type u} should be avoided.

Kim Morrison (May 13 2024 at 07:57):

Yes, should definitely be a structure.

François G. Dorais (May 13 2024 at 07:57):

(I should have read the entire thread rather than just the PR. I see that Mario (and now Kim) had a similar suggestion above.)

Shreyas Srinivas (May 13 2024 at 08:02):

This is also possible with subtypes

Shreyas Srinivas (May 13 2024 at 08:03):

I used it at a few places, but sometimes it is cleaner to write the proof separately

Shreyas Srinivas (May 13 2024 at 08:05):

Oh wait, you mean indirection in memory access?

Shreyas Srinivas (May 13 2024 at 08:08):

Why would there be more indirection with Subtypes? Subtypes are also structures

François G. Dorais (May 13 2024 at 08:18):

There's no difference since Subtype is very thin, but one advantage is that you can extend a structure.

François G. Dorais (May 13 2024 at 08:19):

It depends on the situation but I don't see the advantage of using Subtype here.

Shreyas Srinivas (May 13 2024 at 08:19):

I used it for brevity and the fact that there will be all sorts of free instances

Shreyas Srinivas (May 13 2024 at 08:19):

Coercion for one

Shreyas Srinivas (May 13 2024 at 08:21):

Also, I use a subtype theorem in the proof of the extAux theorem

Eric Wieser (May 13 2024 at 08:25):

Shreyas Srinivas said:

Also, I use a subtype theorem in the proof of the extAux theorem

Adding an Equiv with subtypes (in mathlib) makes this still possible

Eric Wieser (May 13 2024 at 08:25):

I think the "no subtypes" ship already sailed with Fin no longer being a subtype

Shreyas Srinivas (May 13 2024 at 08:29):

Okay. I will probably refactor in the evening (it's morning where I am)

Shreyas Srinivas (May 13 2024 at 09:29):

Eric Wieser said:

Adding an Equiv with subtypes (in mathlib) makes this still possible

Loogle suggests that Equiv is defined in Mathlib so I can't use it here.

Shreyas Srinivas (May 13 2024 at 09:29):

@loogle Equiv

loogle (May 13 2024 at 09:29):

:search: Equiv, Equiv.propEquivBool, and 6186 more

Shreyas Srinivas (May 13 2024 at 09:29):

I am guessing equivalent types have a different name (iso?)

Eric Wieser (May 13 2024 at 09:36):

I don't understand your question.

Shreyas Srinivas (May 13 2024 at 09:36):

I assumed you were suggesting that I use Equiv like mathlib. So I looked it up and found that it was also defined there, so I couldn't use it.

Eric Wieser (May 13 2024 at 09:37):

I meant put those results downstream in mathlib, because that's where Equiv is

Shreyas Srinivas (May 13 2024 at 09:37):

The ext lemma should be in batteries

Eric Wieser (May 13 2024 at 09:37):

Sure, but that lemma is trivial anyway

Shreyas Srinivas (May 13 2024 at 09:37):

But the theorem I am using is probably a one-liner, so it doesn't matter

Shreyas Srinivas (May 13 2024 at 10:40):

I'd welcome any help with map_cons in Data.Vector.Lemmas

Shreyas Srinivas (May 13 2024 at 10:40):

It seems to go into monad world with Array.mapM and then I am lost

Shreyas Srinivas (May 13 2024 at 10:44):

My best guess is there is some theorem that can rewrite the Array.map based equality with something based on List

Henrik Böving (May 13 2024 at 11:13):

You shouldn't need to work with the mapM part of the API, the Array.map API is rich enough to support the vast majority of use cases and if it is not the Array.map API should be extended.

Additionally your implementation of cons seems to be suboptimal to me. What this function will do is: Take a Vec, turn it into a List, List.cons an element to that list and then turn that List into a Vec.

I believe it would already be more efficient to just do #[newHead] ++ xs. If you write it like that you will also stay inside of the Array API the whole time instead of switching between Array and List which will make your proofs easier.

Shreyas Srinivas (May 13 2024 at 11:16):

Is that useful proof-wise though?

Shreyas Srinivas (May 13 2024 at 11:16):

I don't think cons is meant to be used in code anyway.

Shreyas Srinivas (May 13 2024 at 11:17):

In arrays, consing on the end is more efficient (what Haskell calls last iirc)

Shreyas Srinivas (May 13 2024 at 11:17):

There is a push function for this purpose

Kim Morrison (May 13 2024 at 11:18):

Shreyas Srinivas said:

I don't think cons is meant to be used in code anyway.

So, delete it, or, if there is reason to keep it, switch to Henrik's suggested implementation.

Shreyas Srinivas (May 13 2024 at 11:18):

It was part of the mathlib vector API

Kim Morrison (May 13 2024 at 11:19):

:woman_shrugging: You don't need to copy everything. If something is bad, leave it out.

François G. Dorais (May 13 2024 at 11:23):

Shreyas Srinivas said:

It was part of the mathlib vector API

Oh! That's why you keep falling back on lists! You should revise that plan. All of the proofs you need should follow easily from the Array API. If that's not the case then either something is missing in the Array API or it doesn't belong in the Vector API. (Since the Array API is well tested, the latter is much more likely.)

François G. Dorais (May 13 2024 at 11:25):

I would start with just the functions for Vector.Basic. Don't waste too much time on lemmas that might be deleted anyway.

Shreyas Srinivas (May 13 2024 at 11:27):

I have replaced most list based defs with array functions. The remaining ones will be changed as I port more functions from Init.Data.Array files

Shreyas Srinivas (May 13 2024 at 11:28):

The other reason I use lists in one or two proofs is that either they are theorems about toList from Mathlib's Vector

Shreyas Srinivas (May 13 2024 at 11:31):

I intend to write toArray versions for those theorems, but it seems sensible to keep the toList versions for backward compatibility

François G. Dorais (May 13 2024 at 12:29):

Awesome work @Shreyas Srinivas! You seem to be on a roll. I'll be back later today for more reviews.

Shreyas Srinivas (May 13 2024 at 13:18):

I'll add further functions from Init.Data.Array.Basic once I satisfy all existing reviews. There are still all those monadic functions to adapt.

Shreyas Srinivas (May 13 2024 at 13:18):

I have removed all the lemmas from Mathlib.Vector which are list based for now. If needed they can be added back.

François G. Dorais (May 13 2024 at 13:35):

Excellent! Looks like there needs an import but everything compiles!

Shreyas Srinivas (May 13 2024 at 13:37):

I think it makes sense to split this PR. What is added now should be usable outside monadic contexts. Subsequent PRs would be a good way to add :

Monadic defs : e.g. foldl, foldr, filter etc.
Theorems
Stuff from other files like binary search and sort

Shreyas Srinivas (May 13 2024 at 13:38):

Looking at the PR page, it looks rather cluttered as it is.

Shreyas Srinivas (May 13 2024 at 13:39):

If the consensus is that I continue working on everything in this PR, that's also fine.

Shreyas Srinivas (May 13 2024 at 13:43):

Another idea I had but would like to ask first : Monadic functions and the other functions that depend on monadic functions could go into a separate file. It seems useful both from an organisation perspective and from a parallel build perspective.

Shreyas Srinivas (May 13 2024 at 13:58):

There's an Array.size_insertAt missing from the Array API

François G. Dorais (May 13 2024 at 14:07):

There's a CI error that Batteries.Data.Vector is not imported into Batteries, so you need to add some imports to Batteries.Data and Batteries.Data.Vector.

A sweep to check spacing and indentation would be nice but that's not critical.

I think it's okay to split into smaller PRs. I don't think this file is a compilation bottleneck so monadic functions could be in the same file but I think there is no harm in having separate files. I would say that's up to you.

Shreyas Srinivas (May 13 2024 at 14:09):

I think this last commit would pass CI minus lints

François G. Dorais (May 13 2024 at 14:12):

Ideally, Array.size_insertAt goes in core but it can be parked in Array.Lemmas for the time being.

François G. Dorais (May 13 2024 at 14:15):

Lint errors don't look too bad.

Shreyas Srinivas (May 13 2024 at 14:17):

I think it makes sense to have a separate PR to modify that file?

François G. Dorais (May 13 2024 at 14:18):

If you need to use Array.size_insertAt in this PR, then add it to this PR. If it's only needed for later then another quick PR is fine.

Shreyas Srinivas (May 13 2024 at 14:22):

Two of the unused argument lints make no sense. You ideally want an ambient vector whose size you are asking in the size function. Something similar for isEmpty function, even if the information is available statically without them

François G. Dorais (May 13 2024 at 14:27):

I'm not sure isEmpty is needed for vectors.

Shreyas Srinivas (May 13 2024 at 14:27):

That's a fair point

Shreyas Srinivas (May 13 2024 at 14:30):

But for size, one would still want an ambient vector

François G. Dorais (May 13 2024 at 14:30):

Just disable the linter for Vector.size, it's a bogus one.

Shreyas Srinivas (May 13 2024 at 15:22):

Disabled and pushed.

Shreyas Srinivas (May 13 2024 at 16:03):

I think I have now added all the non-monadic functions from Init.Data.Array.Basic to Basic.lean (except zip, but the zip functions belong with unzip which uses foldl). The build workflow needs approval, but I am guessing the linter will have no further issues.

Shreyas Srinivas (May 13 2024 at 16:06):

Next question : what attributes need adding? Which functions get @[inline] and @[specialize] etc. What do these attributes do?

Shreyas Srinivas (May 13 2024 at 16:12):

I think once this PR builds at this point, it should be closed. I will open another PR for adding monadic functions, and then lemmas

Shreyas Srinivas (May 13 2024 at 16:24):

Also, every file with meaningful content seems to have some authorship comments at the top. What's the norm for this?

Shreyas Srinivas (May 13 2024 at 16:48):

Sidenote : Found some typos in the Lint comments I believe these linter names are now camelCase.

François G. Dorais (May 14 2024 at 04:01):

Shreyas Srinivas said:

Next question : what attributes need adding? Which functions get @[inline] and @[specialize] etc. What do these attributes do?

The @[inline] attribute should be applied to any def that is a thin wrapper for a lower level function. (Also special cases for performance issues but that's usually not to worry about until there is an actual performance issue.) Lean should probably do automatic inlining for many of those but that is not quite implemented yet as far as I know. (The backend does some of that automatically though.) When your def is a thin wrapper then you might want to make it an abbrev, which makes it both @[inline] and @[reducible] (and some extra kernel stuff). Sometimes @[reducible] is not desirable, then just do @[inline] instead of abbrev.

The @[specialize] attribute can be used when your def has a "generic" parameter. For example, in Array.map the function f is a generic parameter. The @[specialize] attribute instructs the compiler to generate a separate function for each instance of f (when that is possible). This should be used if the primary use of the def is to be used repeatedly with the same generic parameter. This is not actually as common as you might think.

Shreyas Srinivas (May 15 2024 at 12:32):

batteries#793 is up for community review. Francois has conducted a rigorous and thorough review already.

Shreyas Srinivas (May 15 2024 at 15:12):

Hi everyone. I need some opinions here

Shreyas Srinivas (May 15 2024 at 15:14):

It turns out that Init doesn't have Array definitions with safe Nat indices (i.e. the proof of i < n for index i and Array size n is supplied with get_elem_tactic as a default argument). But it would be nice to have such definitions for Vector, piggy-backing on the Array API with Fin indexes ofc.

Shreyas Srinivas (May 15 2024 at 15:14):

The question is, what should the naming convention for these functions be

Shreyas Srinivas (May 15 2024 at 15:16):

Example of such a function :

def swapNat (v : Vector α n) (i j : Nat) (h_i : i < n := by get_elem_tactic) (h_j : j < n := by get_elem_tactic) : Vector α n :=
  ⟨Array.swap v.toArray (v.size_eq.symm ▸ ⟨i, h_i⟩) (v.size_eq.symm ▸ ⟨j, h_j⟩), proof⟩
  where
    proof := by
      rw [size_swap]
      exact v.size_eq

I don't like the name swapNat. Maybe swapSafeNatIndex conveys more meaning, but makes names too long.

Yaël Dillies (May 15 2024 at 15:16):

You could use SafeIdx maybe?

Shreyas Srinivas (May 15 2024 at 15:20):

That sounds nice. @Frédéric Dupuis proposed swapN

Frédéric Dupuis (May 15 2024 at 15:28):

I think we really should go with a short name, I suspect this is actually the most useful version in the long run (i.e. in the middle of a for loop, this is what you'll want to use, etc).

Mario Carneiro (May 15 2024 at 15:42):

I think the "main" swap function should take Fin n indices, but this seems like a reasonable alternative variation

Mario Carneiro (May 15 2024 at 15:43):

swapN seems fine to me

Shreyas Srinivas (May 15 2024 at 16:39):

Done. There are now some ...N functions for vectors.

Shreyas Srinivas (May 15 2024 at 17:27):

Batteries#793 is ready for review again. I think I have addressed all existing comments

Shreyas Srinivas (May 15 2024 at 17:55):

Note : The definitions could also be added for Arrays, but they belong in the core's Array Basic file

François G. Dorais (May 15 2024 at 19:52):

There's no problem adding new functions in Batteries.Array.Basic.

Shreyas Srinivas (May 15 2024 at 23:39):

Done.

Shreyas Srinivas (May 16 2024 at 11:12):

This PR seems ready for merge. Once merged, I will start a PR for adding monadic array API to vectors

Shreyas Srinivas (May 16 2024 at 11:52):

Back during the port we had this #queue link on zulip for mathlib's PR queue (don't know if it still exists). Is there such a thing for batteries?

Kim Morrison (May 16 2024 at 11:53):

No, Batteries is typically much lower traffic, and the github PR tab is sufficient.

Shreyas Srinivas (May 16 2024 at 11:54):

okay. Makes sense. Mathlib might have been particularly high traffic during the port. EDIT: Oh wait the link still exists.

Kim Morrison (May 16 2024 at 11:54):

It remains extremely high traffic. Look at the #queue. :-)

Shreyas Srinivas (May 16 2024 at 12:06):

Btw, is there anything I need to do to change labels or something for the PR?

Kim Morrison (May 16 2024 at 12:15):

I haven't had time to look at this PR. I just opened it, and immediately see that you are introducing new definitions in the Array namespace, e.g. getN and following. These should be in their own PR, prior to introducing Vector.

Shreyas Srinivas (May 16 2024 at 12:17):

I can remove these. The vector definitions of functions like getN use other Array equivalents. I added them in as Francois suggested them here.

Shreyas Srinivas (May 16 2024 at 12:17):

The Vector API I have added can happily work with the existing Array API

Shreyas Srinivas (May 16 2024 at 12:18):

The shrink lemmas I added are the only ones this PR truly needs.

Shreyas Srinivas (May 16 2024 at 12:21):

Creating a separate PR that is added before this PR might be a bit of a headache since I am working on the main branch of my fork, and I am 102 commits ahead on my PR than the main branch of batteries.

Kim Morrison (May 16 2024 at 12:21):

Sorry? :-)

Shreyas Srinivas (May 16 2024 at 12:21):

I don't understand

Kim Morrison said:

Sorry? :-)

Kim Morrison (May 16 2024 at 12:22):

You need to do it anyway, and I'm sorry that it is a lot of work.

Shreyas Srinivas (May 16 2024 at 12:22):

To clarify, I mean can't I make a separate PR after?

Kim Morrison (May 16 2024 at 12:23):

Mario may feel otherwise, but I wouldn't be happy merging the PR as is because of (at least) this issue. (That said, I haven't had the time, and I won't anytime soon, to look at the rest of the PR, so I'm not going to be the one merging anyway.)

Shreyas Srinivas (May 16 2024 at 12:28):

Okay, but it is going to be difficult to anticipate every missing piece of the array API in advance. Although these particular defs are added for completeness sake and can be removed without any trouble, it is going to be a bit of a PR tangle if every array change needs to be a separate PR.

Shreyas Srinivas (May 16 2024 at 12:28):

@Mario Carneiro thoughts?

Mario Carneiro (May 16 2024 at 12:29):

I am currently at a conference and don't have much time to devote here this week. I'll try to drop in when I can

Mario Carneiro (May 16 2024 at 12:31):

I agree with Kim, if this is a large PR then it should be split up, and if you contribute to both Array and Vector then that's a natural split point

Mario Carneiro (May 16 2024 at 12:31):

Do not be afraid to create sequences of dependent PRs

Shreyas Srinivas (May 16 2024 at 12:33):

Okay, so here's a plan for this particular instance: I'll create a new PR for the five array defs I added and move the defs there. This should leave batteries#793 with only two small theorems for Array.shrink

Kim Morrison (May 16 2024 at 12:33):

(It's fine to have multiple in sequences PRs open --- you can point to later ones as justification for the the earlier ones.)

Shreyas Srinivas (May 16 2024 at 12:34):

Everything else (99%) that is new will be in Vector

Shreyas Srinivas (May 16 2024 at 12:34):

Would this take the PR closer to merge?

Shreyas Srinivas (May 16 2024 at 13:05):

How do I ensure that my new PR is created against an existing PR as opposed to a branch?

Shreyas Srinivas (May 16 2024 at 13:06):

The new PR has commits from the old PR of course. It was created by branching off the old PR's branch

Shreyas Srinivas (May 16 2024 at 13:12):

Update : batteries#793 does not add N defs anymore. Those are in batteries#800

Shreyas Srinivas (May 16 2024 at 13:15):

The only things batteries#793 adds to Array are two short two-liner theorems about shrink.

Shreyas Srinivas (May 16 2024 at 13:19):

I am at the limit of my git magic.

Yaël Dillies (May 16 2024 at 14:34):

You should mark batteries#793 as depending on batteries#800. See eg #12836 and #12845 for an example of how to do it

Yaël Dillies (May 16 2024 at 14:34):

batteries#793 should still contain the N defs. The point is that once batteries#800 is merged, those defs will disappear from the diff

Shreyas Srinivas (May 16 2024 at 14:35):

It does

Yaël Dillies (May 16 2024 at 14:35):

Sorry, edited my message for clarity

Shreyas Srinivas (May 16 2024 at 14:36):

The only difference between the PRs right now is that batteries#793 doesn't contain N defs and batteries#800 does

Yaël Dillies (May 16 2024 at 14:36):

Ah sorry, batteries#793 is the dependency and batteries#800 is the dependent. I thought it was the other way around

Shreyas Srinivas (May 16 2024 at 14:40):

Basically i took the branch of batteries#793, namely main and created a new branch Array_enhancements. At this point both these branches contained the N defs.
Then I deleted the N defs from the main branch in the next commit. So now we have :
* batteries#793 which doesn't contains N defs
* batteries#800 which does contain N defs.

Yaël Dillies (May 16 2024 at 14:41):

Okay, that works but I have a better workflow

Shreyas Srinivas (May 16 2024 at 14:42):

Maybe what works is if

I apply the last commit of 793 which removed N defs to 800. This leaves 800 without n defs but makes the merge easy.
and then add another fix commit which removes the N defs again from 800.

Shreyas Srinivas (May 16 2024 at 14:43):

I can accomplish a rebase of branch Array_enhancements on main. and then explicitly add N defs to 800

Yaël Dillies (May 16 2024 at 14:43):

You have PR X from branch A. Somebody asks you to split off changes C from it. Go to master and do git merge --squash A. Remove from the staged changes all changes but C. Commit. Create a branch B. Open PR Y. Mark X as depending on Y.

Yaël Dillies (May 16 2024 at 14:44):

This workflow assumes that PRs are squash-merged to master (which is the case for Batteries, right?)

Shreyas Srinivas (May 16 2024 at 14:44):

yes, but it might hurt reviews

Shreyas Srinivas (May 16 2024 at 14:44):

there are a 102 commits to squash

Shreyas Srinivas (May 16 2024 at 14:44):

And there are loads of review comments

Yaël Dillies (May 16 2024 at 14:44):

Usually it's fine to discard those if they have already been addressed

Yaël Dillies (May 16 2024 at 14:45):

Shreyas Srinivas said:

there are a 102 commits to squash

Is the way the changes are spread across commits relevant to the review?

Yaël Dillies (May 16 2024 at 14:45):

Mine usually aren't

Shreyas Srinivas (May 16 2024 at 14:48):

You might lose the decision history behind the design choices

Yaël Dillies (May 16 2024 at 14:49):

I don't think this information should be hidden in commits that are going to be squashed to master anyway. If you want it, write it in the PR description

Shreyas Srinivas (May 16 2024 at 14:53):

How did you mark the dependency?

Shreyas Srinivas (May 16 2024 at 14:54):

In my case the mergify bot just said it had problems with the matching hashes of the head of both PRs or something

Shreyas Srinivas (May 16 2024 at 14:54):

Specifically, I am searching the github PR page for a place where I can do this.

Shreyas Srinivas (May 16 2024 at 14:58):

I think got batteries#800 to be a strict extension of batteries#793

Shreyas Srinivas (May 16 2024 at 14:58):

commit history wise

Yaël Dillies (May 16 2024 at 15:00):

Shreyas Srinivas said:

How did you mark the dependency?

Look at the source code of the PR description of #12845

Shreyas Srinivas (May 16 2024 at 15:00):

That just adds a checkmark. I did that a few minutes ago.

Yaël Dillies (May 16 2024 at 15:01):

Uh maybe we don't have the bot adding the blocked-by-other-PR label

Shreyas Srinivas (May 16 2024 at 15:01):

I don't think I can add labels to batteries. Permissions issue I guess.

Yaël Dillies (May 16 2024 at 15:02):

@Mario Carneiro, could we make sure the PR process is as similar as possible between batteries and mathlib4?

Shreyas Srinivas (May 16 2024 at 15:03):

I think that's unlikely. Mathlib's PR process is a bit of an outlier because of some cloud cache generating business. Mario explained this to me long ago.

Yaël Dillies (May 16 2024 at 15:03):

Okay but the bot adding blocked-by-other-PR has nothing to do with caching

Shreyas Srinivas (May 16 2024 at 15:04):

That might have something to do with me being a first time contributor to batteries.

Yaël Dillies (May 16 2024 at 15:04):

No I really don't think it's a you problem

Mario Carneiro (May 16 2024 at 15:07):

This sounds like @Kim Morrison 's jurisdiction. There is a bot that will let you apply awaiting-review/awaiting-author labels, and I think there is also a bot that adds/removes merge-conflict. I have not much knowledge about this aspect of the CI setup for mathlib so I am leaving this to others

Mario Carneiro (May 16 2024 at 15:08):

But Shreyas is correct that Batteries doesn't completely copy Mathlib's github setup specifically because of the permissions issue

Mario Carneiro (May 16 2024 at 15:10):

In the absence of automation, you can either poke someone with triage perms on Batteries to add the label, or live without it

Shreyas Srinivas (May 16 2024 at 15:12):

Currently batteries#793 has $\leq$ 10 lines of Array code in Array.Lemmas. This code (two tiny size theorems basically) was provided by @François G. Dorais during the review process.

Shreyas Srinivas (May 16 2024 at 15:14):

I hope this is not a big barrier for getting this PR through.

Shreyas Srinivas (May 16 2024 at 15:30):

I have added a separate PR for these two lemmas batteries#801

But in order to maintain buildability I haven't removed them from 793 yet. After batteries#801 is merged, I will do a git pull and merge.

The PRs need to be accepted in the following order : batteries#801 -> batteries#793 -> batteries #800
Reviews welcome.

Mario Carneiro (May 16 2024 at 15:35):

when you have dependent PRs, the dependent should be an end-extension of the dependency, i.e. it should contain all the commits and then some more at the end. If you take the middle commits out then it will generally break the dependent PR, so don't do that. (If it doesn't break the dependent PR then they are not actually dependent and should be separate PRs.)

Shreyas Srinivas (May 16 2024 at 15:37):

I think I fixed that issue between 793 and 800. 793's commit history should be a prefix of 800

Mario Carneiro (May 16 2024 at 15:39):

793 has a ton of commits though, could you squash it if it's not serving any review purpose?

Shreyas Srinivas (May 16 2024 at 15:40):

Okay. Afaik, all comments were addressed though one or two haven't been marked resolved officially

Shreyas Srinivas (May 16 2024 at 15:59):

Mario Carneiro said:

793 has a ton of commits though, could you squash it if it's not serving any review purpose?

Squashed it all into one commit.

Shreyas Srinivas (May 16 2024 at 16:06):

Both batteries#793 and batteries#800

Shreyas Srinivas (May 16 2024 at 16:18):

@Mario Carneiro I have rebased 793 (and consequently 800) onto 801. batteries#801 adds a couple of Array lemmas that batteries#793 (and consequently batteries#800) use

Shreyas Srinivas (May 16 2024 at 16:18):

This means the PRs should now have no problem getting merged correctly

Shreyas Srinivas (May 16 2024 at 16:20):

I need someone to run the CI on 793 but considering the CI automatically and succesfully runs on 800 which is top of this foodchain, I think the CI won't complain about 793 either

François G. Dorais (May 16 2024 at 16:26):

Since Shreyas is not a git pro, let's try to give very clear instructions. 793 is a bit of a mess now.

Shreyas Srinivas (May 16 2024 at 16:30):

I think I might have fixed it all. I will make a separate PR for improving the unexpander doc

Shreyas Srinivas (May 16 2024 at 16:31):

Currently, adding anything new to 793 might cause issues with 800

Shreyas Srinivas (May 16 2024 at 16:31):

For reference the docstring you proposed @François G. Dorais was

The definitions `mkVector*` provide convenient ways to write terms for short vectors, especially in metaprogramming. Unexpanders are provided make these shortcuts invisible to users.

Shreyas Srinivas (May 16 2024 at 16:43):

All 3 PRs are green

Shreyas Srinivas (May 16 2024 at 17:14):

Update : 3 PRs ready for review

Shreyas Srinivas (May 16 2024 at 17:42):

(deleted)

Shreyas Srinivas (May 17 2024 at 16:50):

Hi all, batteries#803 is independent of the other PRs and ready for review

Shreyas Srinivas (May 17 2024 at 16:58):

batteries#801 can also be reviewed separately

Shreyas Srinivas (May 18 2024 at 10:39):

Hi, I need a bit of help: I am ~~porting~~ adding the remaining mostly monadic defs from arrays to vectors in a soon-to-be new PR when I came across this function:

@[inline] def getEvenElems (as : Array α) : Array α :=
  (·.2) <| as.foldl (init := (true, Array.empty)) fun (even, r) a =>
    if even then
      (false, r.push a)
    else
      (true, r)

This seems an oddly specific function to have as opposed to, say, a general filterIdx function which carries the array index and the accumulator, instead of the boolean flag.

Is there a reason for doing things this way?

Shreyas Srinivas (May 18 2024 at 10:41):

Note: I am suggesting filterIdx as opposed to filter because a filter function already exists and doesn't carry the array index paired with the accumulator

Frédéric Dupuis (May 18 2024 at 10:41):

Indeed a filterIdx function would be much more useful.

Shreyas Srinivas (May 21 2024 at 12:55):

Shreyas Srinivas said:

Update : 3 PRs ready for review

batteries#803

batteries#801 -> batteries#793

Review request:
I am a bit nervous to chain more PRs on top of these. I would like to request reviews of these PRs so that they get merged before I add anymore. Not quite sure whom to ping.

Yuyang Zhao (May 21 2024 at 17:38):

Would it be possible later to change the runtime or have another type to not store size and capacity to make it more lightweight? I think most of the time we do not need to change its size, otherwise we should use Array?

Shreyas Srinivas (May 21 2024 at 17:53):

Vector is an Array with a static size parameter

Shreyas Srinivas (May 21 2024 at 17:54):

The runtime already supports dynamic arrays with some efficient C code

Henrik Böving (May 21 2024 at 17:57):

I think @Yuyang Zhao understands this. What they are asking for is the following: The way which your type currently looks:

/-- `Vector α n` is an `Array α` whose size is statically fixed to `n` -/
structure Vector (α : Type u) (n : Nat) where
  /--Internally, a vector is stored as an array for fast access-/
  toArray : Array α
  /--`size_eq` fixes the size of `toArray` statically-/
  size_eq : toArray.size = n
deriving Repr, BEq, DecidableEq

Means that the Array will store its size and its capacity in a lean_object + you additionally have to carry around the n parameter even though it is already present inside of the Array. The idea would be to have a specialized lean_object type that is basically represented in a way similar to yours in the logic but internally doesn't carry around a capacity and also doesn't carry around its current size twice. That would require changes to the runtime which in turn requires coordination with the FRO.

Shreyas Srinivas (May 21 2024 at 17:58):

Oh I see what you mean

Shreyas Srinivas (May 21 2024 at 17:58):

Apologies

Shreyas Srinivas (May 21 2024 at 17:59):

This would also require changes to the Vector API

Shreyas Srinivas (May 21 2024 at 17:59):

Currently there are a lot of functions which alter the size of the vector

Frédéric Dupuis (May 21 2024 at 18:02):

I'm not an expert on the Lean compiler, but my guess is that the performance gain would be minuscule compared to the effort it would require.

Henrik Böving (May 21 2024 at 18:08):

I think that's right. The memory footprint is only increased noticeably for very small arrays and the better cache should be miniscule enough for this not to matter.

Shreyas Srinivas (May 21 2024 at 18:13):

To add foolproof (i.e. non panicking) API functions, the Vector structure would effectively have to carry a parameter that serves the same role as capacity does for array and limit modifications that make vector size larger than this capacity, or use some null terminator like character whose position would have to be tracked. So the API would become painful.

Shreyas Srinivas (May 21 2024 at 18:13):

So proofs would get hideous

Shreyas Srinivas (May 21 2024 at 18:14):

Also, I don't recall the specific optimisations exactly, but C compilers can heavily optimise malloc calls, and pointer fetches.

Shreyas Srinivas (May 21 2024 at 18:23):

I see no point in trying to outdo GCC

Henrik Böving (May 21 2024 at 18:26):

Lean's output gets compiled with clang btw.

Shreyas Srinivas (May 21 2024 at 18:26):

Ah so we do have a way to get a wasm backend

Shreyas Srinivas (May 21 2024 at 18:27):

But my point applies to clang too.

Henrik Böving (May 21 2024 at 18:32):

Shreyas Srinivas said:

Also, I don't recall the specific optimisations exactly, but C compilers can heavily optimise malloc calls, and pointer fetches.

I mean I don't really see how this applies here? Currently you have this layout for Array: https://github.com/leanprover/lean4/blob/master/src/include/lean/lean.h#L155C1-L160C21 and then there is an additional (tagged pointer) integer floating around somewhere as the dimension parameter that's externalized in your structure. There is no way that a C compiler could figure this out at the precision that basically declaring a builtin type:

/-- `Vector α n` is an `Array α` whose size is statically fixed to `n` -/
structure Vector (α : Type u) (n : Nat) where
  /--Internally, a vector is stored as an array for fast access-/
  toArray : Array α
  /--`size_eq` fixes the size of `toArray` statically-/
  size_eq : toArray.size = n
deriving Repr, BEq, DecidableEq

with a memory representation of:

typedef struct {
    lean_object   m_header;
    size_t        m_size;
    lean_object * m_data[0];
} lean_vector_object;

could have. It's just that doing this optimization is almost meaningless in the first place.

Yuyang Zhao (May 21 2024 at 18:44):

I'm not sure, because I'll be using small Vectors a lot. But I can just trust you : )

Mario Carneiro (May 21 2024 at 18:51):

Yuyang Zhao said:

Would it be possible later to change the runtime or have another type to not store size and capacity to make it more lightweight? I think most of the time we do not need to change its size, otherwise we should use Array?

I thought about this, and my conclusion was that it's not possible, because the garbage collector (which has no type information and can only see the graph structure of lean_objects because they all have enough tagging to differentiate) needs to know both how large the original allocation is and how many of the cells are initialized to clean up properly after it.

Shreyas Srinivas (May 21 2024 at 20:06):

Henrik : I wasn't looking at the implementation specifics of lean. I see your point.

Shreyas Srinivas (May 21 2024 at 20:07):

Shreyas Srinivas said:

Shreyas Srinivas said:

Update : 3 PRs ready for review

batteries#803

batteries#801 -> batteries#793

Review request:
I am a bit nervous to chain more PRs on top of these. I would like to request reviews of these PRs so that they get merged before I add anymore. Not quite sure whom to ping.

Making sure this doesn't message doesn't get lost in the discussion.

Kim Morrison (May 21 2024 at 23:02):

Shreyas Srinivas said:

Review request:
I am a bit nervous to chain more PRs on top of these. I would like to request reviews of these PRs so that they get merged before I add anymore. Not quite sure whom to ping.

Making sure this doesn't message doesn't get lost in the discussion.

Currently only @Mario Carneiro and I have merge rights on Batteries, and both of us seem to be pretty busy. This bottleneck needs an solution soon, and we are working on it!

Further reviews by others are super helpful, in any case (whether asking for more changes or giving approval).

François G. Dorais (May 22 2024 at 13:58):

The smaller related PRs batteries#801 and batteries#803 each have two independent approvals. They are ready to merge.

James Gallicchio (May 22 2024 at 21:28):

Mario Carneiro|110049 said:

I thought about this, and my conclusion was that it's not possible, because the garbage collector (which has no type information and can only see the graph structure of lean_objects because they all have enough tagging to differentiate) needs to know both how large the original allocation is and how many of the cells are initialized to clean up properly after it.

(until we reach the promised land of linear type annotations, at which point it should be possible to have static-length arrays with the length not stored in the object)

Mario Carneiro (May 22 2024 at 21:29):

no, I'm saying that's not sufficient without the promised-er land of a compiler that actually makes use of typing information for memory layout and with static garbage collection a la rust

Mario Carneiro (May 22 2024 at 21:31):

An array without length information is similar to an unboxed tuple, it can't be passed to anything type-erasing without overhead

Kim Morrison (May 23 2024 at 01:11):

François G. Dorais said:

The smaller related PRs batteries#801 and batteries#803 each have two independent approvals. They are ready to merge.

I've hit merge on both of these, without looking. :-(

François G. Dorais (May 23 2024 at 01:19):

Thanks Kim!

@Shreyas Srinivas: you can now git merge main into the batteries#793 PR to update.

Shreyas Srinivas (May 23 2024 at 09:31):

Done. PR batteries#793 passes CI and has one approval. Waiting for more reviews or a merge. Now the CI runs automatically. As I suspected, first time PRs need maintainers approval for running CI (Francois did confirm this)

Shreyas Srinivas (May 23 2024 at 16:29):

The PR is again all-green. Francois golfed the proofs.

Shreyas Srinivas (May 24 2024 at 07:41):

Apologies for messaging again. I am looking for approvals/merge/more reviews on batteries#793

There is already one approving review. The sooner this PR gets through the sooner we can have static vectors in batteries.

Eric Wieser (May 24 2024 at 22:32):

Reiterating my comment elsewhere, I don't think this PR should be merged until an adaptation PR is passing CI in mathlib (as otherwise mathlib gets stuck for a while and can't pick up other batteries releases)

Shreyas Srinivas (May 24 2024 at 22:54):

Okay. I can do this ~24 hours from now

Shreyas Srinivas (May 24 2024 at 22:57):

The simplest companion PR puts Mathlib's Vector in a namespace and calls the namespaced version in the two places it is used.

Shreyas Srinivas (May 24 2024 at 22:57):

Once this is merged, batteries#793 should be safe to merge.

Shreyas Srinivas (May 24 2024 at 22:59):

Once batteries Vector is available, Mathlib's Vector can be transitioned to this API and the namespace removed. As I see it, this is the smoothest path, involving the least tinkering I can think of.

Shreyas Srinivas (May 24 2024 at 23:03):

It also gives us great flexibility to work with the Mathlib Vector API

CC: @Mario Carneiro , @Kim Morrison and @Eric Wieser to avoid discussing across three threads

Mario Carneiro (May 24 2024 at 23:03):

There is no need to ping this PR every day. We're aware of it

Shreyas Srinivas (May 24 2024 at 23:04):

No this is about how to manage this and Mathlib's Vector. The discussion is spread across three threads

Shreyas Srinivas (May 24 2024 at 23:05):

Since Eric mentioned the companion PR as a requirement to merging this PR, I thought this is the best place to decide on what is to be done.

Mario Carneiro (May 24 2024 at 23:05):

the mathlib PR should probably be discussed on #mathlib4

Shreyas Srinivas (May 24 2024 at 23:06):

My proposed plan involves both this PR and the Mathlib PR

Mario Carneiro (May 24 2024 at 23:06):

but I agree that having a plan to not break mathlib is a prerequisite for landing this PR

Shreyas Srinivas (May 24 2024 at 23:07):

Shreyas Srinivas said:

The simplest companion PR puts Mathlib's Vector in a namespace and calls the namespaced version in the two places it is used.

Specifically the plan starts with this message

Mario Carneiro (May 24 2024 at 23:07):

OK, sgtm

Mario Carneiro (May 24 2024 at 23:08):

that can probably be landed in advance anyway

Shreyas Srinivas (May 24 2024 at 23:09):

Yes. This plan works without creating dependencies.

François G. Dorais (May 25 2024 at 03:42):

Kim Morrison said:

There is general enthusiasm in the FRO for working on this as Batteries.Vector rather than _root_.Vector.

If namespaced in Batteries, as Kim suggested, then there are no downstream issues at all. That would also make migration simpler if core puts a new Vector type in the root namespace.

Shreyas Srinivas (May 25 2024 at 07:01):

If the FRO doesn't want _root_.Vector, then Mathlib's Vector would also need to go into a namespace

Eric Wieser (May 25 2024 at 07:34):

Shreyas Srinivas said:

I thought this is the best place to decide on what is to be done.

Surely your previous thread in #mathlib 4 is the right place? Please try not to restart discussions in different channels.

Eric Wieser (May 25 2024 at 07:35):

(maybe this is my fault for not linking that thread in my message above)

Shreyas Srinivas (Jun 05 2024 at 11:50):

Hi, it has been over a week. I just want to check in and ask if I can get this PR reviewed or merged. It already has an approving review. The mathlib PR to test this was approved last week. I would like to use this API soon if I can. I apologise if this is too soon to send a reminder.

Shreyas Srinivas (Jul 09 2024 at 17:04):

I tried upgrading this batteries#793 to 4.10.0-rc1. When doing so, I ran into a surprising issue. Previously, in the Lemmas.lean file:

@[simp]
theorem map_empty (f : α → β) : map f empty = empty :=
  rfl

The above just worked. Now rfl doesn't work. I guess this is due to some changes in the way either rfl or reducibility works by default, but I can't pinpoint the cause or the fix.

Shreyas Srinivas (Jul 09 2024 at 17:08):

PS: I also noticed there is no default gitpod config for batteries and no link shows up on PRs. I have been using gitpod for mathlib recently and now I wonder if it makes sense to add something for batteries.

Jannis Limperg (Jul 09 2024 at 17:16):

Is map defined by well-founded recursion? If so, it's now irreducible by default.

Shreyas Srinivas (Jul 09 2024 at 17:16):

I tried unseal map in..

Shreyas Srinivas (Jul 09 2024 at 17:17):

The result :

failed to set `[local semireducible]`, `Batteries.Vector.map` is currently `[semireducible]`, `[irreducible]` expected
use `set_option allowUnsafeReducibility true` to override reducibility status validation

Kim Morrison (Jul 09 2024 at 17:18):

So it is something further down the definitional chain that has become irreducible.

Shreyas Srinivas (Jul 09 2024 at 17:20):

Is there a way to find out which one?

Shreyas Srinivas (Jul 09 2024 at 17:21):

Some debug trace for rfl?

Henrik Böving (Jul 09 2024 at 17:22):

#reduce map f empty with a sufficient binder for f

Shreyas Srinivas (Jul 09 2024 at 17:23):

I think it is Array.mapM.map

Henrik Böving (Jul 09 2024 at 17:23):

Can you prove it.

Shreyas Srinivas (Jul 09 2024 at 17:25):

No, but digging down, it is the closest well founded recursive function

Henrik Böving (Jul 09 2024 at 17:25):

Well just run reduce and see where you get stuck?

Henrik Böving (Jul 09 2024 at 17:26):

Oh maybe reduce uses kernel reduction which ignores this actually :/

Henrik Böving (Jul 09 2024 at 17:27):

Well either way if Array.map is at fault you should be able to just unfold your map and use a simp call to verify your theorem, that seems better than relying on some defeq anyways

Shreyas Srinivas (Jul 09 2024 at 17:29):

So unseal Array.mapM.map worked.

Shreyas Srinivas (Jul 09 2024 at 17:30):

But a funny thing is unseal had to be put in before the docstring.

Shreyas Srinivas (Jul 09 2024 at 17:30):

It was unintuitive at first.

Shreyas Srinivas (Jul 09 2024 at 17:31):

It was a syntax error anywhere else. EDIT : I would think putting unseal just under the @[simp] line and above the def line was the correct syntax.

Henrik Böving (Jul 09 2024 at 17:31):

Using the unseal here seems like a pretty bad idea to me, you are relying on some hyperspecifics of the way that your underlying API is implemented. Why not unfold + simp instead.

Shreyas Srinivas (Jul 09 2024 at 17:32):

You get stuck in a loop. simp [map, empty, Array.map, Array.mapM, Array.mapM.map] gives a time out.

Shreyas Srinivas (Jul 09 2024 at 17:33):

I think there's also an Id.run somewhere in there.

Henrik Böving (Jul 09 2024 at 17:33):

But you aren't supposed to unfold Array.map and mapM with your loop, there should be a theorem somewhere that claims mapping over the empty array yields the empty array

Shreyas Srinivas (Jul 09 2024 at 17:33):

There isn't one.

Shreyas Srinivas (Jul 09 2024 at 17:34):

At least not now.

Henrik Böving (Jul 09 2024 at 17:34):

Well then that might be a good contribution to make instead of comitting to unstable proof schemes ^^

Shreyas Srinivas (Jul 09 2024 at 17:34):

I do recall getting the theorem name from Array but a quick loogle search for the same name turned up empty.

Shreyas Srinivas (Jul 09 2024 at 17:36):

So if it was indeed removed, I am not sure adding it is wise on my part.

Shreyas Srinivas (Jul 09 2024 at 17:41):

On second thoughts, I might have obtained it from Mathlib Vectors.

Shreyas Srinivas (Jul 09 2024 at 17:46):

I need to commute now. I'll make a PR once I am home.

Shreyas Srinivas (Jul 09 2024 at 18:05):

OTOH, I did want to ask again if the intended syntax for unseal is to put it before docstrings. For large docstrings, this might make it hard to keep track this line

François G. Dorais (Jul 09 2024 at 19:22):

I don't think simp is the right way to deal with broken rfl proofs due to irreducible defs. @Joachim Breitner and I briefly discussed this in batteries#832 but I think we both went on to more pressing matters and never looked back.

For the moment, I tend to prefer with_unfolding_all rfl even though that throws a big huge rug over the problem. But simp is another big rug, which is admittedly easier to trim to size, but it's also not guaranteed to work at all as a replacement.

Joachim Breitner (Jul 09 2024 at 19:54):

I’d say unseal is better than with_unfolding_all, but would say simp or rw is even more proper – they use the .eq_* equational theorems that are the public API for a function defined by well-founded recursion, whereas rfl considers the internal construction.

Joachim Breitner (Jul 09 2024 at 19:55):

Shreyas Srinivas said:

OTOH, I did want to ask again if the intended syntax for unseal is to put it before docstrings. For large docstrings, this might make it hard to keep track this line

unseal … in work like set_option … in and precedes a full “command”. The docstring is part of the def command syntax. So while unintuitive, it is justifiable.

Maybe there ought to be unseal … in <tactic>.

Shreyas Srinivas (Jul 09 2024 at 19:58):

Joachim Breitner said:

I’d say unseal is better than with_unfolding_all, but would say simp or rw is even more proper – they use the .eq_* equational theorems that are the public API for a function defined by well-founded recursion, whereas rfl considers the internal construction.

So is it justified to use rfl for an Array.map_empty API function?

François G. Dorais (Jul 09 2024 at 21:11):

Joachim Breitner said:

I’d say unseal is better than with_unfolding_all, but would say simp or rw is even more proper – they use the .eq_* equational theorems that are the public API for a function defined by well-founded recursion, whereas rfl considers the internal construction.

I agree that unseal is better than with_unfolding_all because it's a localized fix rather than a big rug. But it relies on knowing exactly what def to unseal, which is not trivial and we have no automated tools for.

My issues with simp are that (a) it does a whole lot more than just applying .eq_* lemmas, and (b) it works bottom-up instead of top-down.

I still think we need a tactic that is specialized for dealing with this general issue.

Joachim Breitner (Jul 09 2024 at 21:34):

When I say simp I am inclined to mean something like simp only […]. With that I am not sure I see the problem. But you can also use rw if simp seems too poweful.

In this particular case I think the problem is that Array.map itself has an imcomplete API, it seems, and there should be an Array.map_empty that properly hides the construction of Array.map, and then Vector.map would be an thin wrapper around Array.map_empty.

Shreyas Srinivas (Jul 09 2024 at 21:55):

François G. Dorais said:

I still think we need a tactic that is specialized for dealing with this general issue.

I guess what's needed is to set reducibility at a per tactic level or tactic variants with different reducibility settings.

Shreyas Srinivas (Jul 09 2024 at 21:55):

Like with_reducible rfl

Shreyas Srinivas (Jul 09 2024 at 23:13):

Is there a consensus on adding Array.map_empty? Is it okay to use unseal Array.mapM and rfl there?

Shreyas Srinivas (Jul 09 2024 at 23:28):

I'll be offline for 12 hours or more. If a consensus is reached and anyone wishes to PR it in the meantime, I welcome it.

François G. Dorais (Jul 09 2024 at 23:30):

Joachim Breitner said:

When I say simp I am inclined to mean something like simp only […]. With that I am not sure I see the problem. But you can also use rw if simp seems too poweful.

Yes, this is what I meant when I said that simp is "easier to trim to size". But this is still not a good solution, simp is easy to run into loops and mayhem, whereas (with_unfolding_all or not) rfl stops as soon as unification fails and can never loop.

We do have tools to trim down simps, but simp needs to work first!

Joachim Breitner said:

In this particular case I think the problem is that Array.map itself has an imcomplete API, it seems, and there should be an Array.map_empty that properly hides the construction of Array.map, and then Vector.map would be an thin wrapper around Array.map_empty.

That is true and it should be fixed. But you need to be able to identify the problem first! The sudden failure of rfl in this case doesn't identify the problem at all and it's hard to dig at it.

François G. Dorais (Jul 10 2024 at 02:27):

Is it time to revive unification hints? They're already implemented but not very much used.

Yaël Dillies (Jul 10 2024 at 03:44):

François G. Dorais said:

Joachim Breitner said:

I’d say unseal is better than with_unfolding_all, but would say simp or rw is even more proper – they use the .eq_* equational theorems that are the public API for a function defined by well-founded recursion, whereas rfl considers the internal construction.

My issues with simp are that (a) it does a whole lot more than just applying .eq_* lemmas, and (b) it works bottom-up instead of top-down.

It can be made to work top-down inside an expression?

Shreyas Srinivas (Jul 10 2024 at 11:17):

batteries#877 is rather short

Shreyas Srinivas (Jul 10 2024 at 11:20):

Hopefully it can be merged soon

Shreyas Srinivas (Jul 10 2024 at 12:19):

To clarify, batteries#877 adds the array API lemma map_empty. It is ready for review (can't add labels)

Shreyas Srinivas (Jul 19 2024 at 21:37):

Update : This issue has been properly fixed. batteries#793 is again ready for review/merge. Vector.map_empty uses the corresponding Array.map_empty theorem.

Last updated: May 02 2025 at 03:31 UTC