Zulip Chat Archive
Stream: batteries
Topic: RFC: Arrays with fixed length
Mario Carneiro (Mar 28 2024 at 21:12):
The fact that we don't have a type {arr : Array A // arr.size = n}
is a consistent source of pain when doing monadic verification of code containing arrays. The set
function can potentially change arr.size
, so any time you are using set
you need to carry around something saying actually the size is still what it was before so that you can still index into it with Fin n
indices. This code is often not just in the verification theorems but also in the code itself, either because it wants to use the non-panicking array indexing function Array.get
or because some property is needed about the contents of the array (necessarily stated only over the elements actually in the array) in order to prove termination. Examples of this include UnionFind
and HashMap
from Std, as well as some larger software verification projects I've been working on, or most recently this example of doing proofs about do notation, which was supposed to be exemplary but ended up being something kind of embarrassing to show.
Defining the type and operations should be relatively straightforward: I mean it to be exactly that subtype (or possibly a structure), and all operations are just filtered through. Code performance should be the same as Array, except that some functions pass n
unnecessarily. Inlining should hopefully mitigate this issue.
Another possible improvement is to use a representation that avoids duplicate information by not storing the length. However:
- This would require changes to the runtime and compiler, and my sense is that this is a way to ensure it won't happen for 1+ year
- Although the title says "Arrays with fixed length", the length here isn't really "fixed", it's only counted externally; one can still have a
push
method and we do not want to lose the dynamic array doubling behavior, so keeping the capacity and length separate is desirable. - The runtime does not have type information, so we need to store both capacity and length in the runtime value regardless, so that the right objects are destroyed when an array object is freed.
Regarding the name, I don't have any good ideas. CArray A n
was used in the previous discussion on dependent arrays (which is a separate thing). In any case I'd like to get this in std soonish. Thoughts? @Joe Hendrix @Scott Morrison
Joachim Breitner (Mar 28 2024 at 21:34):
Yes, please.
Although I found that where I wanted such arrays, I quickly also wanted a variant where the element type can depend on their position, when nesting them. But I guess that digresses the thread…
Mario Carneiro (Mar 28 2024 at 21:35):
yes, that would be the "dependent array" case
Mario Carneiro (Mar 28 2024 at 21:36):
that one has some additional runtime implications though (you can't get inline layout if all the types in the array are different)
Mario Carneiro (Mar 28 2024 at 21:37):
my current way of handling such cases with regular arrays (when the "underlying type" is actually the same, i.e. no storing a Nat
next to a String
) is to have a hypothesis on the side saying the property that holds of every element of the array
Joe Hendrix (Mar 28 2024 at 21:38):
I think this is common enough that it could go in Lean itself. I discussed the homogenous case with Scott yesterday since it was useful for a formalization of AES. Here's a link to that implementation if folks are interested and AES using it is here.
Mario Carneiro (Mar 28 2024 at 21:39):
Is it going to be used in lean?
Joe Hendrix (Mar 28 2024 at 21:44):
Potentially, but that's just at the discussion stage.
I've had some discussions (including with you) where we upstream more data structures to reduce duplication and improve consistency (e.g., unify Std.HashMap
and Lean.HashMap
).
Another factor is that Vector
and BitVec
should use consistent names and lemmas and it may be easier if they are in the same package.
Mario Carneiro (Mar 28 2024 at 21:45):
I think that if we upstream it at all we should do so later after the API is settled
François G. Dorais (Mar 28 2024 at 21:46):
This is definitely needed. However, Subarray
already has most of the required features and some extra (and some bugs). So why not just overhaul Subarray
?
I've often thought that maybe Subarray
should be the main array type and Array
should be a low-level array-storage type.
Mario Carneiro (Mar 28 2024 at 21:50):
Another factor is that Vector and BitVec should use consistent names and lemmas and it may be easier if they are in the same package.
This is completely unrelated, we can name them whatever in whatever repo they end up in. Besides this, I would say that BitVec
is the outlier here, all of the array like types are called Array
except for BitVec
Mario Carneiro (Mar 28 2024 at 21:52):
I'm not against the name Vector
, but it may be problematic for mathlib which has actual vector spaces and may not like us taking the name for something else
Kyle Miller (Mar 28 2024 at 21:58):
@François G. Dorais docs#Subarray has the complexity that it contains elements outside the range, so it doesn't have the right extensionality lemma. I think it's necessary that Subarray
work like this, because it's supposed to be an inexpensive view into an Array
.
There's a way around this by using a Quot
, but that adds a lot of complexity.
There are also the issue that Subarray
doesn't expose the length of the subarray as a parameter, which I don't think you'd want to have exposed that way.
Eric Wieser (Mar 28 2024 at 21:58):
Mathlib has made do with docs#Vector referring to List
s of fixed length forever, so I don't think that's a concern
Mario Carneiro (Mar 28 2024 at 21:58):
you also can't push
to a Subarray
Mario Carneiro (Mar 28 2024 at 21:59):
let's move the discussion about movement from Std to Init to another topic
Joe Hendrix (Mar 28 2024 at 22:03):
I noticed was that once one has arrays with external lengths, one often needs matrices quickly afterwards. Matrices seem like an addition that should be deferred though.
Joachim Breitner (Mar 28 2024 at 22:41):
Mario Carneiro said:
that one has some additional runtime implications though (you can't get inline layout if all the types in the array are different)
With inline layout, do you mean a packed non-pointer representation? Do we have that in lean arrays now?
Henrik Böving (Mar 28 2024 at 23:08):
No, only ByteArray
and FloatArray
have this power right now afaik
François G. Dorais (Mar 28 2024 at 23:24):
Kyle Miller said:
François G. Dorais docs#Subarray has the complexity that it contains elements outside the range, so it doesn't have the right extensionality lemma. I think it's necessary that
Subarray
work like this, because it's supposed to be an inexpensive view into anArray
.
Indeed, I didn't flesh out my thoughts here. You're right that Subarray
is not a perfect alternative to fixed-length arrays. They do need an overhaul because they are currently not "an inexpensive view into an Array
" because of bugs like lean4#2360.
My second sentence is the one I should have fleshed out more. The idea is to move the current Array
to a more bare-bones type, probably unsafe, not for direct public consumption, that just handles memory storage operations. And then have the actual public Array
be implemented more like Subarray
, with more transparent operations where it is visible when a copy can or cannot happen and where operations like set
transparently don't modify size
.
That said, this is along the same lines as Mario's "Another possible improvement" from above, so not for immediate consideration.
There are also the issue that
Subarray
doesn't expose the length of the subarray as a parameter, which I don't think you'd want to have exposed that way.
I believe this is what computed fields are for.
François G. Dorais (Mar 28 2024 at 23:28):
Henrik Böving said:
No, only
ByteArray
andFloatArray
have this power right now afaik
But the infrastructure for more scalar arrays is already there. I once played with implementing different scalar arrays (e.g. native UInt*
arrays). It is possible but tricky to do well from outside core.
Mario Carneiro (Mar 29 2024 at 01:42):
/poll What should the name be for arrays of fixed size?
CArray
Array'
Array (and change the core type to something else)
DArray
Vec
Vector
François G. Dorais (Mar 29 2024 at 02:05):
What's the D
in DArray
?
Mario Carneiro (Mar 29 2024 at 02:06):
dependent, as in "dependent type over length"
Mario Carneiro (Mar 29 2024 at 02:06):
It's a dependent type the same way Fin n
is
Mario Carneiro (Mar 29 2024 at 02:07):
but it has an unfortunate collision with the other way that arrays could be dependent, i.e. storing different types in each slot of the array
Mario Carneiro (Mar 29 2024 at 02:08):
Note that the lean 3 name of this type is array
; the type now called Array
was called buffer
and they were defined in the opposite order, with buffer α := Σ n, array α n
François G. Dorais (Mar 29 2024 at 02:26):
Interesting historical note. Thank you! I think I would prefer it the old way but that would mean a major runtime overhaul, so not anytime soon.
Joachim Breitner (Mar 29 2024 at 10:40):
I’d reserve DArray
for the index-dependent version
DArray (size : Nat) (i : Fin size → Type _) : Type _
(Or DVector
if the above becomes Vector
)
Alex Keizer (Mar 29 2024 at 14:27):
At the risk of even more bike-shedding: Do we want to think about what the name should be for List
s of fixed length as well? These are currently called Vector
in mathlib, which conflicts with the most popular option of the poll.
Eric Wieser (Mar 29 2024 at 22:00):
If we want to copy C++, then Vector (or Vec to match rust) should be the variable-length array and Array should be the fixed length one
Mario Carneiro (Mar 29 2024 at 22:01):
well, even the fixed length one isn't really an Array in the C++ sense
Mario Carneiro (Mar 29 2024 at 22:01):
because it's not actually fixed length, it has all the same dynamic array behaviors as the other type
Mario Carneiro (Mar 29 2024 at 22:02):
there is still potentially some space for an actual fixed size Array type like the haskell type
Alex Keizer (Apr 01 2024 at 11:38):
That distinction is mostly about whether the data lives on the stack vs the heap, which is not a thing at all for us. As Mario points out, mathlibs Vector
is not fixed-length, it is statically-known length. I'd say in this case we shouldn't worry too much about mirroring C++'s Array vs Vector distinction
Eric Wieser (Apr 01 2024 at 11:48):
I don't agree that C++'s array vs vector is stack vs heap; unique_ptr<array<T,n>>
lives on the heap, and std::vector<T,customalloc>
can (probably) live on the stack.
Alex Keizer (Apr 01 2024 at 13:12):
Ah, I guess my understanding of C++ was flawed
Mario Carneiro (Apr 01 2024 at 23:05):
I propose we go forward with the name Vector
, unless others object, cc: @Joe Hendrix @Scott Morrison . I'll try to open a PR based on Joe's code today or tomorrow and add some lemmas.
As for the mathlib type of the same name, I think it won't be too bad to rename it since mathlib also has "vector2" and "vector3" (I think vector2 got deleted at some point though). We can open a topic on #mathlib4 to discuss it
Eric Wieser (Apr 01 2024 at 23:12):
What's the origin of the "vector" terminology for fixed length? Rust, C++, Go, and Haskell all seem to use Vector
for dynamic-length objects.
Eric Wieser (Apr 01 2024 at 23:15):
vector2
never existed as a type; data/vector2.lean
was a file about Vector
that couldn't call itself data.vector
because this clashed with a file of the same name in core.
Mario Carneiro (Apr 01 2024 at 23:16):
this is a dynamic length object though, that's what my point was above
Mario Carneiro (Apr 01 2024 at 23:17):
I'm not even sure I would call it a type with "statically-known length", because in dependent type theory everything is a bit wishy washy regarding the static/dynamic dichotomy (one of the reasons compiling DTT to something other than a uniform object model is hard)
Eric Wieser (Apr 01 2024 at 23:17):
I guess my question is really then "is there any precedent for using the combination of the names Array
and Vector
in this way"?
Mario Carneiro (Apr 01 2024 at 23:25):
Another point in favor of using Vector
for the fixed length array type instead of the dynamic length one is that arrays of length n actually form a vector space (assuming the elements of the array are a ring or module), unlike dynamic length arrays (you have to do something funny with truncating or extending to fit if you want to add arrays of different sizes, and this process means that the space of all Array A
is not a vector space without either subtyping it or quotienting it first).
Mario Carneiro (Apr 01 2024 at 23:28):
In fact, I believe the obvious vector space structure on Vector A n
is exactly how the word "vector" first came to be associated with these long lists of numbers in computing (I'm thinking of examples like vector registers). Things like std::vector<T>
are then so named because they "generalize" the concept of these fixed length lists, in the process losing the "vector space" part
Eric Wieser (Apr 01 2024 at 23:30):
Apparently in C++
It's called a vector because Alex Stepanov, the designer of the Standard Template Library, was looking for a name to distinguish it from built-in arrays. He admits now that he made a mistake, because mathematics already uses the term 'vector' for a fixed-length sequence of numbers. C++11 compounds this mistake by introducing a class 'array' that behaves similarly to a mathematical vector.
Eric Wieser (Apr 01 2024 at 23:33):
Should the titular object be called Array.Vector
to leave room for List.Vector
? Perhaps the former could be exported to allow Vector
to refer to it?
Mario Carneiro (Apr 01 2024 at 23:33):
yeah I was hoping we would get a name that generalizes more clearly to the list case
Mario Carneiro (Apr 01 2024 at 23:34):
I was thinking of proposing something like LVector
for the mathlib type
Mario Carneiro (Apr 01 2024 at 23:35):
We don't have much precedent for putting types in the namespace of other types when they are both top level data structures like this
Eric Wieser (Apr 01 2024 at 23:38):
String.Pos comes to mind, but "top level" is somewhat subjective
Mario Carneiro (Apr 01 2024 at 23:39):
There are examples which are like preconditions or properties on their parent (e.g. HashMap.WF
), but in such cases there is usually a clear hierarchy between the notions
Mario Carneiro (Apr 01 2024 at 23:41):
same thing with String.Pos
, which although it does not itself have a type dependent over String
, is only usable via methods in the String
namespace and really only carries semantics with respect to a string (not even the type String
but a particular string. Unfortunately the type doesn't really help with getting this association right).
Mario Carneiro (Apr 01 2024 at 23:42):
with Vector
the idea is really that you can use the API without ever having to have or refer to an Array
Mario Carneiro (Apr 01 2024 at 23:49):
But more than anything else, I'm most interested in getting all the comments about the name out now so that we aren't blocked on it for too long. Having the type itself in some form is more important than getting the name right, and while I would prefer not to think about what a pain renaming it would be there are already name changes of similar impact in the pipeline, waiting for that time when core gets a naming czar or otherwise finds a way around the current stalemate.
Alex Keizer (Apr 02 2024 at 10:04):
Eric Wieser said:
What's the origin of the "vector" terminology for fixed length? Rust, C++, Go, and Haskell all seem to use
Vector
for dynamic-length objects.
I've seen Idris also use "Vector" for this type. Their docs say:
A standard example of a dependent data type is the type of “lists with length”, conventionally called vectors in the dependent type literature.
Alex Keizer (Apr 02 2024 at 10:04):
So there is precedent for the use of Vector, at least. Not sure about the Array part
Mario Carneiro (Apr 02 2024 at 10:06):
I would have preferred we use the name Array
for non-dynamic arrays (we don't have these in lean right now, but the API would be something like Haskell's Array
type)
Tomas Skrivan (Apr 02 2024 at 14:15):
In my library, I use ArrayN
, ListN
and DataArrayN
which are all variants with the size specified in the type. I personally prefer suffix from prefix as it helps with discoveribility together with autocomplete. (DataArray
is an array that stores only types with fixed byte size and uses one contiguous block of memory)
Tomas Skrivan (Apr 02 2024 at 14:19):
Also I do not index them by Fin n
but by a generic type I
which has an instance of IndexType I
(class from LeanColls)
Mario Carneiro (Apr 02 2024 at 14:32):
you can add your suggestion to the list, but I guess it's probably past the initial rush at this point
Mario Carneiro (Apr 02 2024 at 14:34):
also keep in mind that if we did use those names then you'd have to change the name of your structures, since this one definitely won't be using IndexType
Joachim Breitner (Apr 02 2024 at 19:53):
Using Vector
for this in lean, will that set precedence for “We are happy to use names in their common non-mathematical, possibly CS inspired sense, even if it overlaps with a common mathematics terms”? I might expect some beginners approach lean and mathlib from the math side might be a bit surprised that there is a Vector
that has very little to do with the mathematical notion with that name. (Probably not a big deal, but I’d like to highlight it here.)
Mario Carneiro (Apr 02 2024 at 19:55):
I think the discussion just above indicates the opposite? C++ is the one that "got it wrong", as the quote from Stepanov above indicates, and although we clearly arrived at the opposite naming by some different path-dependence, that seems better on the whole for us
Mario Carneiro (Apr 02 2024 at 19:57):
Granted, Vector
is still only loosely related to vector spaces and still has a significant CS bent, but it seems reasonable for a language straddling the verified software / theorem proving boundary like Lean
Mario Carneiro (Apr 02 2024 at 19:57):
I don't expect us to actually have a VectorSpace R (Vector R n)
instance, although I suppose it's conceivable
Mario Carneiro (Apr 02 2024 at 20:01):
But if we view this as a diff from current lean/std/mathlib, it's actually not that big: Vector A n
from mathlib got upstreamed along with an implementation-detail data structure swap of List
to Array
for better performance
Mario Carneiro (Apr 02 2024 at 20:01):
and another type Vector2 A n
(name TBD) was added to mathlib with the old implementation for backward compatibility
James Gallicchio (Apr 03 2024 at 00:46):
Alex Keizer said:
Do we want to think about what the name should be for
List
s of fixed length as well?
i personally quite like having a single FixSize transformer that works for any collection that has a size
, but this is all part of the LeanColls abstraction brainrot...
Joe Hendrix (Apr 03 2024 at 04:57):
I am a little surprised how much discussion there is on this. Vector
is used in Coq. I used Vec
in the code linked above since AES (and other crypto) tends to need that type a lot and a concise name was useful. The AES specification came from Cryptol which uses [n]tp
to denote a vector of length n
with elements of type tp
, but that notation doesn't work in Lean.
My personal preference is Vector
or Vec
since that terminology is fairly widely and teven if there is no explicit vector space, they are closely related to vectors in a mathematical sense.
Kim Morrison (Apr 03 2024 at 05:59):
I'm happy with Vector.
Kim Morrison (Apr 03 2024 at 05:59):
The usage of Vector in Mathlib is fairly incidental and if there are any collisions Mathlib can move.
Mario Carneiro (Apr 03 2024 at 07:54):
@Joe Hendrix The Vector
used in Coq is not the same. It is actually a type we use frequently as an example of an inductive family, but we don't actually define it in mathlib (IIRC), this is what one of the vector2 vector3 types was about. It has some aspects that make it less desirable than mathlib's Vector
type, but it has approximately the same layout: a linked list, but with length stored in each node in addition. This is isomorphic to the fixed-length array type being proposed here but quite different in its implementation. The Coq type is:
inductive Vector (α : Type u) : Nat → Type u
| nil : Vector α 0
| cons : α → Vector α n → Vector α (n + 1)
Mario Carneiro (Apr 03 2024 at 08:00):
The type [n]tp
in Cryptol, IIUC, is most similar to C++ std::array<T, N>
or rust [T; N]
, which is to say a non-dynamic array with a statically known size. quoting from "Programming Cryptol":
This is probably obvious, but there is no way to get a value variable to appear in a type context. Types must be
known at “compile time,” and (non-literal) values are not, so there is no way to use them in that way.
In other words, although you can use "constexpr" style evaluation inside the size of an array, it's not truly a dependent type, and it is fixed at compile time.
Mario Carneiro (Apr 03 2024 at 08:03):
Now I'm not too fussed about this difference in semantics between Lean and other languages with compile-time const generics that look vaguely DTT-like at first glance, because Lean does not really have "constexpr" (although maybe it might get it someday when the compiler gets good :tm: ), but I would say it is at least a concern worth registering that we are potentially blocking a future type that will want this "Vector" name which is closer to the vector type in the aforementioned languages.
Shreyas Srinivas (May 11 2024 at 17:21):
What's the latest news on this? Is there a decision to include arrays with static length?
Mario Carneiro (May 11 2024 at 17:22):
awaiting implementation
Shreyas Srinivas (May 11 2024 at 17:23):
In what ways should this be different from the Vector a n
type in Mathlib?
Mario Carneiro (May 11 2024 at 17:24):
isomorphic, but based on arrays instead of lists
Shreyas Srinivas (May 11 2024 at 17:25):
Okay. I can get cracking on this. It will be useful for fair division problems. I need a faster vector type to manipulate allocations
Shreyas Srinivas (May 11 2024 at 17:35):
And also for some circuit reasoning
Shreyas Srinivas (May 11 2024 at 17:47):
@Mario Carneiro : Should I pick the name Vector?
Shreyas Srinivas (May 11 2024 at 23:12):
I have started a PR at batteries#793 @Mario Carneiro
Shreyas Srinivas (May 11 2024 at 23:13):
It is by no means complete, but it attempts to be a "port" of Mathlib.Data.Vector as much as possible. Most (all?) remaining errors are due to Mathlib dependencies
Shreyas Srinivas (May 11 2024 at 23:14):
(deleted)
Shreyas Srinivas (May 11 2024 at 23:34):
The only remaining errors are in Section Accum
. I am guessing it might make sense to add toArray
versions of the toList
theorems.
Shreyas Srinivas (May 11 2024 at 23:36):
I would also like to move everything under ### Basic Theorems
to the corresponding Lemmas.lean file. Additionally there is the question of whether the list based definitions of functions should be changed to use Array functions where possible.
Shreyas Srinivas (May 11 2024 at 23:36):
I want to ask first whether all this is fine, before making any further changes.
Kim Morrison (May 12 2024 at 06:13):
Make sure to looks at Joe's existing code https://leanprover.zulipchat.com/#narrow/stream/348111-batteries/topic/RFC.3A.20Arrays.20with.20fixed.20length/near/430166031 above.
Kim Morrison (May 12 2024 at 06:14):
The name decision seems to be have been (in favour of Vector) at https://leanprover.zulipchat.com/#narrow/stream/348111-batteries/topic/RFC.3A.20Arrays.20with.20fixed.20length/near/430188516.
Eric Wieser (May 12 2024 at 08:09):
Does it make sense to upstream Vector as-is first, so that it can be replaced in-place (producing a clearer diff)?
Eric Wieser (May 12 2024 at 08:10):
It's annoying when upstreaming causes my code to break, and there is no diff to look at that suggests why (beyond the file being completely deleted)
Mario Carneiro (May 12 2024 at 08:19):
I wasn't really thinking of this being an upstreaming of mathlib's Vector
type. I think the natural operations are a bit different when using lists vs arrays
Shreyas Srinivas (May 12 2024 at 08:20):
I am trying to salvage as much as possible since you mentioned that you wanted them isomorphic
Mario Carneiro (May 12 2024 at 08:20):
In this case it most certainly is not such an upstreaming because the underlying type and defeqs are changing. The mathlib Vector
type will stay as it is, except for a rename
Yaël Dillies (May 12 2024 at 08:21):
Is anybody actually using the mathlib Vector
?
Mario Carneiro (May 12 2024 at 08:21):
not really
Mario Carneiro (May 12 2024 at 08:22):
sources of inspiration for the API include mathlib's Vector
, the Array
API, and Joe's Vec
Shreyas Srinivas (May 12 2024 at 08:22):
My next goal is to change existing operations from list to array operations where there are complexity speedups for doing so or where proofs can be repaired to work easily
Shreyas Srinivas (May 12 2024 at 08:23):
I don't know if it makes sense to array-fy List.replicate for example.
Mario Carneiro (May 12 2024 at 08:23):
I'm confused, what is the definition of Vector
in your PR?
Shreyas Srinivas (May 12 2024 at 08:23):
Array whose size is equal to n
Mario Carneiro (May 12 2024 at 08:24):
So in that case all the ops should be array ops, no?
Shreyas Srinivas (May 12 2024 at 08:24):
I initially made everything work with the underlying list to check what works. Then I replaced a few operations with array. Doing the same for the remaining ones next
Mario Carneiro (May 12 2024 at 08:25):
The actual implementation should not be using Array.mk
and Array.toList
in most cases
Mario Carneiro (May 12 2024 at 08:25):
these are expensive operations
Shreyas Srinivas (May 12 2024 at 08:26):
Yeah, hence replacing them one by one
Shreyas Srinivas (May 12 2024 at 08:26):
And then going around fixing all the proofs that break
Mario Carneiro (May 12 2024 at 08:26):
I think it would be easier to just take array operations and tuple them with theorems about what they do to the size
Shreyas Srinivas (May 12 2024 at 08:28):
Before I go further, there is a part of the file adapted from Mathlib's vector that won't work because of mathlib dependencies. Should I comment those out for now, or should their upstreaming also happen in this PR?
Mario Carneiro (May 12 2024 at 08:28):
What part?
Shreyas Srinivas (May 12 2024 at 08:28):
The part under section Accum
Shreyas Srinivas (May 12 2024 at 08:29):
There are a couple of lemmas from linear orders
Shreyas Srinivas (May 12 2024 at 08:29):
And the definition of mapAccumr
Mario Carneiro (May 12 2024 at 08:30):
Why does mapAccumr need theorems?
Shreyas Srinivas (May 12 2024 at 08:30):
Not sure it does.
Mario Carneiro (May 12 2024 at 08:31):
All the theorems should be removed from Batteries.Data.Vector.Basic
and moved to Batteries.Data.Vector.Lemmas
Shreyas Srinivas (May 12 2024 at 08:31):
Okay. Got it. That was another of my questions
Shreyas Srinivas (May 12 2024 at 08:32):
@Yaël Dillies : I tried using it, but it is API-wise not as good as Array
Mario Carneiro (May 12 2024 at 08:33):
make sure you don't copy the bad API decisions too...
Shreyas Srinivas (May 12 2024 at 08:34):
Okay, maybe I read too much into the word isomorphic up there
Shreyas Srinivas (May 12 2024 at 08:34):
Should there also be induction theorems based on indices
Shreyas Srinivas (May 12 2024 at 08:34):
And translations to some for of multiset of items in the array?
Mario Carneiro (May 12 2024 at 08:34):
You mean like an ext lemma? Yes
Shreyas Srinivas (May 12 2024 at 08:34):
(can we do that at the moment?)
Shreyas Srinivas (May 12 2024 at 08:35):
Sort of yes. I would also like an induction principle on array length
Mario Carneiro (May 12 2024 at 08:35):
Batteries doesn't have multisets and the vector structure isn't obviously helpful
Shreyas Srinivas (May 12 2024 at 08:35):
It's useful for algorithms to at least have sets
Mario Carneiro (May 12 2024 at 08:36):
Ok but that's a separate discussion
Shreyas Srinivas (May 12 2024 at 08:37):
There are match pattern functions like head and cons. Does it make sense to create and use array versions of them?
Mario Carneiro (May 12 2024 at 08:37):
not really
Mario Carneiro (May 12 2024 at 08:38):
head is fine
Shreyas Srinivas (May 12 2024 at 12:15):
There are a couple of API defs in Prelude that seem to be primarily for use inside metaprogramming. Namely :
Array.appendCore {α : Type u} (as : Array α) (bs : Array α) : Array α
Array.extract (as : Array α) (start stop : Nat) : Array α
Do we want these definitions for Vector
?
Shreyas Srinivas (May 12 2024 at 12:33):
Isn't it better to have a slice structure on top of Array and Vector, and iterators on top of them?
Shreyas Srinivas (May 12 2024 at 12:34):
Another question: Where is the array notation defined?
Mario Carneiro (May 12 2024 at 12:34):
extract yes, appendCore probably not
Shreyas Srinivas (May 12 2024 at 12:47):
Added extract. Next I want to add array notation for vectors. Where is this notation defined?
Mario Carneiro (May 12 2024 at 12:56):
Shreyas Srinivas (May 12 2024 at 13:12):
I made a quick adaptation of that syntax as below:
import Batteries
namespace Vector
def Vector (α : Type u) (n : Nat) := {a : Array α // a.size = n}
/--
`fromArray (a : Array α)` produces a `Vector of the appropriate size
-/
def fromArray (a : Array α) : Vector α (a.size) := ⟨a, by simp⟩
syntax "#v[" withoutPosition(sepBy(term, ", ")) "]" : term
macro_rules
| `(#v[ $elems,* ]) => `(Vector.fromArray #[ $elems,* ])
#check #v[1,2,3,3,4,4]
end Vector
Shreyas Srinivas (May 12 2024 at 13:13):
That's an mwe. at the check one gets a line like this fromArray #[1, 2, 3, 3, 4, 4] : Vector Nat #[1, 2, 3, 3, 4, 4].size
Shreyas Srinivas (May 12 2024 at 13:13):
Is there a way to force that size
in the type signature to evaluate?
Mario Carneiro (May 12 2024 at 13:15):
you could expand to Vector.mk (n := $(elems.size)) #[$elems,*] rfl
Shreyas Srinivas (May 12 2024 at 13:23):
Error :
invalid field 'size', the environment does not contain 'Lean.Syntax.TSepArray.size'
elems
has type
Lean.Syntax.TSepArray `term ","Lean 4
Mario Carneiro (May 12 2024 at 13:25):
.getElems.size
Shreyas Srinivas (May 12 2024 at 13:31):
Doesn't work. size
isn't found. Isn't is simply Array.size from Init?
import Batteries.Data.Array
import Batteries.Data.List.Basic
import Batteries.Data.List.Lemmas
namespace Vector
open Array Lean Meta
/--
`Vector α n` is an `Array α` whose size is statically fixed to `n`
-/
def Vector (α : Type u) (n : Nat) := {a : Array α // a.size = n}
/--
`fromArray (a : Array α)` produces a `Vector α n` of the appropriate size
-/
def fromArray (n : Nat) (a : Array α) (h : n = a.size): Vector α n := ⟨a, by {simp[h]}⟩
/--Syntax for `Vector α n`-/
syntax "#v[" withoutPosition(sepBy(term, ", ")) "]" : term
macro_rules
| `(#v[ $elems,* ]) => `(Vector.fromArray (n := elems.getElems.size) #[$elems,*] rfl)
#check #v[1,2,3,3,4,4] -- unknown identifier 'elems.getElems.size✝'
end Vector
Mario Carneiro (May 12 2024 at 13:32):
you are missing the antiquotation
Shreyas Srinivas (May 12 2024 at 13:36):
Now it works but the size function remains unreduced
Mario Carneiro (May 12 2024 at 13:36):
?
Shreyas Srinivas (May 12 2024 at 13:36):
import Batteries.Data.Array
import Batteries.Data.List.Basic
import Batteries.Data.List.Lemmas
namespace Vector
open Array Lean Meta
/--
`Vector α n` is an `Array α` whose size is statically fixed to `n`
-/
def Vector (α : Type u) (n : Nat) := {a : Array α // a.size = n}
/--
`fromArray (a : Array α)` produces a `Vector α n` of the appropriate size
-/
def fromArray (n : Nat) (a : Array α) (h : n = a.size): Vector α n := ⟨a, by {simp[h]}⟩
/--Syntax for `Vector α n`-/
syntax "#v[" withoutPosition(sepBy(term, ", ")) "]" : term
set_option pp.proofs true
macro_rules
| `(#v[ $elems,* ]) => `(Vector.fromArray (n := #[$elems,*].size) #[$elems,*] rfl)
#check #v[1,2,3,3,4,4] -- can't reduce size
end Vector
Shreyas Srinivas (May 12 2024 at 13:37):
Should I make it an explicit parameter?
Mario Carneiro (May 12 2024 at 13:37):
/--
`Vector α n` is an `Array α` whose size is statically fixed to `n`
-/
structure Vector (α : Type u) (n : Nat) where
toArray : Array α
size_eq : toArray.size = n
/--Syntax for `Vector α n` -/
syntax "#v[" withoutPosition(sepBy(term, ", ")) "]" : term
macro_rules
| `(#v[ $elems,* ]) => `(Vector.mk (n := $(quote elems.getElems.size)) #[$elems,*] rfl)
#check #v[1,2,3,3,4,4]
Shreyas Srinivas (May 12 2024 at 13:39):
Thanks. This quotation syntax remains a bit of wizardry for me
Shreyas Srinivas (May 12 2024 at 14:25):
I am now adding the def
s in Init.Data.Array.Basic
. This is a very large file, so I expect this task to take a few days.
Frédéric Dupuis (May 12 2024 at 16:28):
(deleted)
Kim Morrison (May 12 2024 at 22:16):
I think it works be good to PR the deprecated tag to Mathlib's Vector
sooner rather than later.
Kim Morrison (May 12 2024 at 22:17):
I assumed this is going to Batteries, at least at first?
Shreyas Srinivas (May 12 2024 at 22:17):
This PR is going to batteries. It is far from done. There is a lot of stuff from Arrays, theorems relating to both Arrays and toList, search, sort etc, that is to be added.
Shreyas Srinivas (May 12 2024 at 22:18):
To be on the safer side I'd say this PR won't be done until next weekend, although I might bring it to feature parity with Array and Mathlib.Vector sooner than that
Shreyas Srinivas (May 12 2024 at 22:20):
Once all the pieces are in place. There is some serious reorganisation needed. The defs have to be organised better. Array is spread out all over Init and batteries, and many docstrings are missing. For Vector I want to ensure that at least all defs have a docstring.
Shreyas Srinivas (May 12 2024 at 22:22):
Also, at least two sections from The Mathlib Vector file depend on defs in Mathlib.
Shreyas Srinivas (May 12 2024 at 22:23):
They can't be used as is for the Vector I am designing anyway, but the proofs for the Vector will probably have similar dependencies.
Eric Wieser (May 12 2024 at 22:59):
Is the name going to be Batteries.Vector
or _root_.Vector
?
Shreyas Srinivas (May 12 2024 at 22:59):
I am putting this in Batteries. It is upto maintainers to decide where it gets moved
Mario Carneiro (May 12 2024 at 23:03):
Good question. I was initially leaning toward _root_.Vector
, but then Batteries got booted from leanprover/ so I'm not so sure anymore. Most of the other data structures are in the Batteries namespace
Mario Carneiro (May 12 2024 at 23:05):
It also solves the name clash issue, although not in a very satisfactory way
Mario Carneiro (May 12 2024 at 23:07):
On the other hand this file will almost certainly be consumed by core once it matures, so putting it in _root_
will mean one less rename later
Shreyas Srinivas (May 12 2024 at 23:14):
Array has functions like popWhile
and takeWhile
for which no reasonable counterpart exists for Vector a n
, because one wouldn't know the size of the output in advance
Mario Carneiro (May 12 2024 at 23:14):
then skip them
Kim Morrison (May 12 2024 at 23:14):
Let me consult re: namespacing
Kim Morrison (May 13 2024 at 06:03):
There is general enthusiasm in the FRO for working on this as Batteries.Vector
rather than _root_.Vector
.
François G. Dorais (May 13 2024 at 07:55):
Is there a reason to use a subtype? Personally, I would use
structure Vector (α : Type _) (n : Nat) extends Array α where
ofArray ::
size_eq : toArray.size = n
to avoid indirection.
Note that this allows to transport many Array
ops cleanly and efficiently:
@[inline] def Vector.push : Vector α n → α → Vector α (n+1)
| ⟨a, h⟩, x => ⟨a.push x, h ▸ Array.size_push ..⟩
Quick style note: Batteries, like core, loves auto implicit, so {α : Type u}
should be avoided.
Kim Morrison (May 13 2024 at 07:57):
Yes, should definitely be a structure.
François G. Dorais (May 13 2024 at 07:57):
(I should have read the entire thread rather than just the PR. I see that Mario (and now Kim) had a similar suggestion above.)
Shreyas Srinivas (May 13 2024 at 08:02):
This is also possible with subtypes
Shreyas Srinivas (May 13 2024 at 08:03):
I used it at a few places, but sometimes it is cleaner to write the proof separately
Shreyas Srinivas (May 13 2024 at 08:05):
Oh wait, you mean indirection in memory access?
Shreyas Srinivas (May 13 2024 at 08:08):
Why would there be more indirection with Subtypes? Subtypes are also structures
François G. Dorais (May 13 2024 at 08:18):
There's no difference since Subtype is very thin, but one advantage is that you can extend a structure.
François G. Dorais (May 13 2024 at 08:19):
It depends on the situation but I don't see the advantage of using Subtype here.
Shreyas Srinivas (May 13 2024 at 08:19):
I used it for brevity and the fact that there will be all sorts of free instances
Shreyas Srinivas (May 13 2024 at 08:19):
Coercion for one
Shreyas Srinivas (May 13 2024 at 08:21):
Also, I use a subtype theorem in the proof of the extAux
theorem
Eric Wieser (May 13 2024 at 08:25):
Shreyas Srinivas said:
Also, I use a subtype theorem in the proof of the
extAux
theorem
Adding an Equiv
with subtypes (in mathlib) makes this still possible
Eric Wieser (May 13 2024 at 08:25):
I think the "no subtypes" ship already sailed with Fin
no longer being a subtype
Shreyas Srinivas (May 13 2024 at 08:29):
Okay. I will probably refactor in the evening (it's morning where I am)
Shreyas Srinivas (May 13 2024 at 09:29):
Eric Wieser said:
Adding an
Equiv
with subtypes (in mathlib) makes this still possible
Loogle suggests that Equiv
is defined in Mathlib
so I can't use it here.
Shreyas Srinivas (May 13 2024 at 09:29):
@loogle Equiv
loogle (May 13 2024 at 09:29):
:search: Equiv, Equiv.propEquivBool, and 6186 more
Shreyas Srinivas (May 13 2024 at 09:29):
I am guessing equivalent types have a different name (iso?)
Eric Wieser (May 13 2024 at 09:36):
I don't understand your question.
Shreyas Srinivas (May 13 2024 at 09:36):
I assumed you were suggesting that I use Equiv
like mathlib. So I looked it up and found that it was also defined there, so I couldn't use it.
Eric Wieser (May 13 2024 at 09:37):
I meant put those results downstream in mathlib, because that's where Equiv
is
Shreyas Srinivas (May 13 2024 at 09:37):
The ext lemma should be in batteries
Eric Wieser (May 13 2024 at 09:37):
Sure, but that lemma is trivial anyway
Shreyas Srinivas (May 13 2024 at 09:37):
But the theorem I am using is probably a one-liner, so it doesn't matter
Shreyas Srinivas (May 13 2024 at 10:40):
I'd welcome any help with map_cons
in Data.Vector.Lemmas
Shreyas Srinivas (May 13 2024 at 10:40):
It seems to go into monad world with Array.mapM
and then I am lost
Shreyas Srinivas (May 13 2024 at 10:44):
My best guess is there is some theorem that can rewrite the Array.map
based equality with something based on List
Henrik Böving (May 13 2024 at 11:13):
You shouldn't need to work with the mapM
part of the API, the Array.map
API is rich enough to support the vast majority of use cases and if it is not the Array.map
API should be extended.
Additionally your implementation of cons
seems to be suboptimal to me. What this function will do is: Take a Vec
, turn it into a List
, List.cons
an element to that list and then turn that List
into a Vec
.
I believe it would already be more efficient to just do #[newHead] ++ xs
. If you write it like that you will also stay inside of the Array
API the whole time instead of switching between Array
and List
which will make your proofs easier.
Shreyas Srinivas (May 13 2024 at 11:16):
Is that useful proof-wise though?
Shreyas Srinivas (May 13 2024 at 11:16):
I don't think cons is meant to be used in code anyway.
Shreyas Srinivas (May 13 2024 at 11:17):
In arrays, consing on the end is more efficient (what Haskell calls last iirc)
Shreyas Srinivas (May 13 2024 at 11:17):
There is a push function for this purpose
Kim Morrison (May 13 2024 at 11:18):
Shreyas Srinivas said:
I don't think cons is meant to be used in code anyway.
So, delete it, or, if there is reason to keep it, switch to Henrik's suggested implementation.
Shreyas Srinivas (May 13 2024 at 11:18):
It was part of the mathlib vector API
Kim Morrison (May 13 2024 at 11:19):
:woman_shrugging: You don't need to copy everything. If something is bad, leave it out.
François G. Dorais (May 13 2024 at 11:23):
Shreyas Srinivas said:
It was part of the mathlib vector API
Oh! That's why you keep falling back on lists! You should revise that plan. All of the proofs you need should follow easily from the Array API. If that's not the case then either something is missing in the Array API or it doesn't belong in the Vector API. (Since the Array API is well tested, the latter is much more likely.)
François G. Dorais (May 13 2024 at 11:25):
I would start with just the functions for Vector.Basic. Don't waste too much time on lemmas that might be deleted anyway.
Shreyas Srinivas (May 13 2024 at 11:27):
I have replaced most list based defs with array functions. The remaining ones will be changed as I port more functions from Init.Data.Array files
Shreyas Srinivas (May 13 2024 at 11:28):
The other reason I use lists in one or two proofs is that either they are theorems about toList from Mathlib's Vector
Shreyas Srinivas (May 13 2024 at 11:31):
I intend to write toArray versions for those theorems, but it seems sensible to keep the toList versions for backward compatibility
François G. Dorais (May 13 2024 at 12:29):
Awesome work @Shreyas Srinivas! You seem to be on a roll. I'll be back later today for more reviews.
Shreyas Srinivas (May 13 2024 at 13:18):
I'll add further functions from Init.Data.Array.Basic once I satisfy all existing reviews. There are still all those monadic functions to adapt.
Shreyas Srinivas (May 13 2024 at 13:18):
I have removed all the lemmas from Mathlib.Vector which are list based for now. If needed they can be added back.
François G. Dorais (May 13 2024 at 13:35):
Excellent! Looks like there needs an import but everything compiles!
Shreyas Srinivas (May 13 2024 at 13:37):
I think it makes sense to split this PR. What is added now should be usable outside monadic contexts. Subsequent PRs would be a good way to add :
- Monadic defs : e.g. foldl, foldr, filter etc.
- Theorems
- Stuff from other files like binary search and sort
Shreyas Srinivas (May 13 2024 at 13:38):
Looking at the PR page, it looks rather cluttered as it is.
Shreyas Srinivas (May 13 2024 at 13:39):
If the consensus is that I continue working on everything in this PR, that's also fine.
Shreyas Srinivas (May 13 2024 at 13:43):
Another idea I had but would like to ask first : Monadic functions and the other functions that depend on monadic functions could go into a separate file. It seems useful both from an organisation perspective and from a parallel build perspective.
Shreyas Srinivas (May 13 2024 at 13:58):
There's an Array.size_insertAt
missing from the Array API
François G. Dorais (May 13 2024 at 14:07):
There's a CI error that Batteries.Data.Vector is not imported into Batteries, so you need to add some imports to Batteries.Data and Batteries.Data.Vector.
A sweep to check spacing and indentation would be nice but that's not critical.
I think it's okay to split into smaller PRs. I don't think this file is a compilation bottleneck so monadic functions could be in the same file but I think there is no harm in having separate files. I would say that's up to you.
Shreyas Srinivas (May 13 2024 at 14:09):
I think this last commit would pass CI minus lints
François G. Dorais (May 13 2024 at 14:12):
Ideally, Array.size_insertAt
goes in core but it can be parked in Array.Lemmas for the time being.
François G. Dorais (May 13 2024 at 14:15):
Lint errors don't look too bad.
Shreyas Srinivas (May 13 2024 at 14:17):
I think it makes sense to have a separate PR to modify that file?
François G. Dorais (May 13 2024 at 14:18):
If you need to use Array.size_insertAt
in this PR, then add it to this PR. If it's only needed for later then another quick PR is fine.
Shreyas Srinivas (May 13 2024 at 14:22):
Two of the unused argument
lints make no sense. You ideally want an ambient vector whose size you are asking in the size
function. Something similar for isEmpty
function, even if the information is available statically without them
François G. Dorais (May 13 2024 at 14:27):
I'm not sure isEmpty
is needed for vectors.
Shreyas Srinivas (May 13 2024 at 14:27):
That's a fair point
Shreyas Srinivas (May 13 2024 at 14:30):
But for size, one would still want an ambient vector
François G. Dorais (May 13 2024 at 14:30):
Just disable the linter for Vector.size
, it's a bogus one.
Shreyas Srinivas (May 13 2024 at 15:22):
Disabled and pushed.
Shreyas Srinivas (May 13 2024 at 16:03):
I think I have now added all the non-monadic functions from Init.Data.Array.Basic to Basic.lean (except zip, but the zip functions belong with unzip
which uses foldl
). The build workflow needs approval, but I am guessing the linter will have no further issues.
Shreyas Srinivas (May 13 2024 at 16:06):
Next question : what attributes need adding? Which functions get @[inline]
and @[specialize]
etc. What do these attributes do?
Shreyas Srinivas (May 13 2024 at 16:12):
I think once this PR builds at this point, it should be closed. I will open another PR for adding monadic functions, and then lemmas
Shreyas Srinivas (May 13 2024 at 16:24):
Also, every file with meaningful content seems to have some authorship comments at the top. What's the norm for this?
Shreyas Srinivas (May 13 2024 at 16:48):
Sidenote : Found some typos in the Lint comments I believe these linter names are now camelCase.
François G. Dorais (May 14 2024 at 04:01):
Shreyas Srinivas said:
Next question : what attributes need adding? Which functions get
@[inline]
and@[specialize]
etc. What do these attributes do?
The @[inline]
attribute should be applied to any def that is a thin wrapper for a lower level function. (Also special cases for performance issues but that's usually not to worry about until there is an actual performance issue.) Lean should probably do automatic inlining for many of those but that is not quite implemented yet as far as I know. (The backend does some of that automatically though.) When your def is a thin wrapper then you might want to make it an abbrev
, which makes it both @[inline]
and @[reducible]
(and some extra kernel stuff). Sometimes @[reducible]
is not desirable, then just do @[inline]
instead of abbrev
.
The @[specialize]
attribute can be used when your def has a "generic" parameter. For example, in Array.map
the function f
is a generic parameter. The @[specialize]
attribute instructs the compiler to generate a separate function for each instance of f
(when that is possible). This should be used if the primary use of the def is to be used repeatedly with the same generic parameter. This is not actually as common as you might think.
Shreyas Srinivas (May 15 2024 at 12:32):
batteries#793 is up for community review. Francois has conducted a rigorous and thorough review already.
Shreyas Srinivas (May 15 2024 at 15:12):
Hi everyone. I need some opinions here
Shreyas Srinivas (May 15 2024 at 15:14):
It turns out that Init doesn't have Array definitions with safe Nat
indices (i.e. the proof of i < n
for index i
and Array size n
is supplied with get_elem_tactic
as a default argument). But it would be nice to have such definitions for Vector, piggy-backing on the Array API with Fin
indexes ofc.
Shreyas Srinivas (May 15 2024 at 15:14):
The question is, what should the naming convention for these functions be
Shreyas Srinivas (May 15 2024 at 15:16):
Example of such a function :
def swapNat (v : Vector α n) (i j : Nat) (h_i : i < n := by get_elem_tactic) (h_j : j < n := by get_elem_tactic) : Vector α n :=
⟨Array.swap v.toArray (v.size_eq.symm ▸ ⟨i, h_i⟩) (v.size_eq.symm ▸ ⟨j, h_j⟩), proof⟩
where
proof := by
rw [size_swap]
exact v.size_eq
I don't like the name swapNat
. Maybe swapSafeNatIndex
conveys more meaning, but makes names too long.
Yaël Dillies (May 15 2024 at 15:16):
You could use SafeIdx
maybe?
Shreyas Srinivas (May 15 2024 at 15:20):
That sounds nice. @Frédéric Dupuis proposed swapN
Frédéric Dupuis (May 15 2024 at 15:28):
I think we really should go with a short name, I suspect this is actually the most useful version in the long run (i.e. in the middle of a for loop, this is what you'll want to use, etc).
Mario Carneiro (May 15 2024 at 15:42):
I think the "main" swap function should take Fin n
indices, but this seems like a reasonable alternative variation
Mario Carneiro (May 15 2024 at 15:43):
swapN
seems fine to me
Shreyas Srinivas (May 15 2024 at 16:39):
Done. There are now some ...N
functions for vectors.
Shreyas Srinivas (May 15 2024 at 17:27):
Batteries#793 is ready for review again. I think I have addressed all existing comments
Shreyas Srinivas (May 15 2024 at 17:55):
Note : The definitions could also be added for Arrays, but they belong in the core's Array Basic file
François G. Dorais (May 15 2024 at 19:52):
There's no problem adding new functions in Batteries.Array.Basic
.
Shreyas Srinivas (May 15 2024 at 23:39):
Done.
Shreyas Srinivas (May 16 2024 at 11:12):
This PR seems ready for merge. Once merged, I will start a PR for adding monadic array API to vectors
Shreyas Srinivas (May 16 2024 at 11:52):
Back during the port we had this #queue link on zulip for mathlib's PR queue (don't know if it still exists). Is there such a thing for batteries?
Kim Morrison (May 16 2024 at 11:53):
No, Batteries is typically much lower traffic, and the github PR tab is sufficient.
Shreyas Srinivas (May 16 2024 at 11:54):
okay. Makes sense. Mathlib might have been particularly high traffic during the port. EDIT: Oh wait the link still exists.
Kim Morrison (May 16 2024 at 11:54):
It remains extremely high traffic. Look at the #queue. :-)
Shreyas Srinivas (May 16 2024 at 12:06):
Btw, is there anything I need to do to change labels or something for the PR?
Kim Morrison (May 16 2024 at 12:15):
I haven't had time to look at this PR. I just opened it, and immediately see that you are introducing new definitions in the Array namespace, e.g. getN
and following. These should be in their own PR, prior to introducing Vector
.
Shreyas Srinivas (May 16 2024 at 12:17):
I can remove these. The vector definitions of functions like getN
use other Array equivalents. I added them in as Francois suggested them here.
Shreyas Srinivas (May 16 2024 at 12:17):
The Vector API I have added can happily work with the existing Array API
Shreyas Srinivas (May 16 2024 at 12:18):
The shrink lemmas I added are the only ones this PR truly needs.
Shreyas Srinivas (May 16 2024 at 12:21):
Creating a separate PR that is added before this PR might be a bit of a headache since I am working on the main branch of my fork, and I am 102 commits ahead on my PR than the main branch of batteries.
Kim Morrison (May 16 2024 at 12:21):
Sorry? :-)
Shreyas Srinivas (May 16 2024 at 12:21):
I don't understand
Kim Morrison said:
Sorry? :-)
Kim Morrison (May 16 2024 at 12:22):
You need to do it anyway, and I'm sorry that it is a lot of work.
Shreyas Srinivas (May 16 2024 at 12:22):
To clarify, I mean can't I make a separate PR after?
Kim Morrison (May 16 2024 at 12:23):
Mario may feel otherwise, but I wouldn't be happy merging the PR as is because of (at least) this issue. (That said, I haven't had the time, and I won't anytime soon, to look at the rest of the PR, so I'm not going to be the one merging anyway.)
Shreyas Srinivas (May 16 2024 at 12:28):
Okay, but it is going to be difficult to anticipate every missing piece of the array API in advance. Although these particular defs are added for completeness sake and can be removed without any trouble, it is going to be a bit of a PR tangle if every array change needs to be a separate PR.
Shreyas Srinivas (May 16 2024 at 12:28):
@Mario Carneiro thoughts?
Mario Carneiro (May 16 2024 at 12:29):
I am currently at a conference and don't have much time to devote here this week. I'll try to drop in when I can
Mario Carneiro (May 16 2024 at 12:31):
I agree with Kim, if this is a large PR then it should be split up, and if you contribute to both Array and Vector then that's a natural split point
Mario Carneiro (May 16 2024 at 12:31):
Do not be afraid to create sequences of dependent PRs
Shreyas Srinivas (May 16 2024 at 12:33):
Okay, so here's a plan for this particular instance: I'll create a new PR for the five array defs I added and move the defs there. This should leave batteries#793 with only two small theorems for Array.shrink
Kim Morrison (May 16 2024 at 12:33):
(It's fine to have multiple in sequences PRs open --- you can point to later ones as justification for the the earlier ones.)
Shreyas Srinivas (May 16 2024 at 12:34):
Everything else (99%) that is new will be in Vector
Shreyas Srinivas (May 16 2024 at 12:34):
Would this take the PR closer to merge?
Shreyas Srinivas (May 16 2024 at 13:05):
How do I ensure that my new PR is created against an existing PR as opposed to a branch?
Shreyas Srinivas (May 16 2024 at 13:06):
The new PR has commits from the old PR of course. It was created by branching off the old PR's branch
Shreyas Srinivas (May 16 2024 at 13:12):
Update : batteries#793 does not add N
defs anymore. Those are in batteries#800
Shreyas Srinivas (May 16 2024 at 13:15):
The only things batteries#793 adds to Array are two short two-liner theorems about shrink.
Shreyas Srinivas (May 16 2024 at 13:19):
I am at the limit of my git magic.
Yaël Dillies (May 16 2024 at 14:34):
You should mark batteries#793 as depending on batteries#800. See eg #12836 and #12845 for an example of how to do it
Yaël Dillies (May 16 2024 at 14:34):
batteries#793 should still contain the N
defs. The point is that once batteries#800 is merged, those defs will disappear from the diff
Shreyas Srinivas (May 16 2024 at 14:35):
It does
Yaël Dillies (May 16 2024 at 14:35):
Sorry, edited my message for clarity
Shreyas Srinivas (May 16 2024 at 14:36):
The only difference between the PRs right now is that batteries#793 doesn't contain N defs and batteries#800 does
Yaël Dillies (May 16 2024 at 14:36):
Ah sorry, batteries#793 is the dependency and batteries#800 is the dependent. I thought it was the other way around
Shreyas Srinivas (May 16 2024 at 14:40):
- Basically i took the branch of batteries#793, namely
main
and created a new branchArray_enhancements
. At this point both these branches contained theN
defs. - Then I deleted the
N
defs from themain
branch in the next commit. So now we have :* batteries#793 which doesn't contains
N
defs
* batteries#800 which does containN
defs.
Yaël Dillies (May 16 2024 at 14:41):
Okay, that works but I have a better workflow
Shreyas Srinivas (May 16 2024 at 14:42):
Maybe what works is if
- I apply the last commit of 793 which removed N defs to 800. This leaves 800 without n defs but makes the merge easy.
- and then add another fix commit which removes the N defs again from 800.
Shreyas Srinivas (May 16 2024 at 14:43):
I can accomplish a rebase of branch Array_enhancements
on main. and then explicitly add N defs to 800
Yaël Dillies (May 16 2024 at 14:43):
You have PR X from branch A. Somebody asks you to split off changes C from it. Go to master
and do git merge --squash A
. Remove from the staged changes all changes but C. Commit. Create a branch B. Open PR Y. Mark X as depending on Y.
Yaël Dillies (May 16 2024 at 14:44):
This workflow assumes that PRs are squash-merged to master (which is the case for Batteries, right?)
Shreyas Srinivas (May 16 2024 at 14:44):
yes, but it might hurt reviews
Shreyas Srinivas (May 16 2024 at 14:44):
there are a 102 commits to squash
Shreyas Srinivas (May 16 2024 at 14:44):
And there are loads of review comments
Yaël Dillies (May 16 2024 at 14:44):
Usually it's fine to discard those if they have already been addressed
Yaël Dillies (May 16 2024 at 14:45):
Shreyas Srinivas said:
there are a 102 commits to squash
Is the way the changes are spread across commits relevant to the review?
Yaël Dillies (May 16 2024 at 14:45):
Mine usually aren't
Shreyas Srinivas (May 16 2024 at 14:48):
You might lose the decision history behind the design choices
Yaël Dillies (May 16 2024 at 14:49):
I don't think this information should be hidden in commits that are going to be squashed to master anyway. If you want it, write it in the PR description
Shreyas Srinivas (May 16 2024 at 14:53):
How did you mark the dependency?
Shreyas Srinivas (May 16 2024 at 14:54):
In my case the mergify bot just said it had problems with the matching hashes of the head of both PRs or something
Shreyas Srinivas (May 16 2024 at 14:54):
Specifically, I am searching the github PR page for a place where I can do this.
Shreyas Srinivas (May 16 2024 at 14:58):
I think got batteries#800 to be a strict extension of batteries#793
Shreyas Srinivas (May 16 2024 at 14:58):
commit history wise
Yaël Dillies (May 16 2024 at 15:00):
Shreyas Srinivas said:
How did you mark the dependency?
Look at the source code of the PR description of #12845
Shreyas Srinivas (May 16 2024 at 15:00):
That just adds a checkmark. I did that a few minutes ago.
Yaël Dillies (May 16 2024 at 15:01):
Uh maybe we don't have the bot adding the blocked-by-other-PR
label
Shreyas Srinivas (May 16 2024 at 15:01):
I don't think I can add labels to batteries. Permissions issue I guess.
Yaël Dillies (May 16 2024 at 15:02):
@Mario Carneiro, could we make sure the PR process is as similar as possible between batteries and mathlib4?
Shreyas Srinivas (May 16 2024 at 15:03):
I think that's unlikely. Mathlib's PR process is a bit of an outlier because of some cloud cache generating business. Mario explained this to me long ago.
Yaël Dillies (May 16 2024 at 15:03):
Okay but the bot adding blocked-by-other-PR
has nothing to do with caching
Shreyas Srinivas (May 16 2024 at 15:04):
That might have something to do with me being a first time contributor to batteries.
Yaël Dillies (May 16 2024 at 15:04):
No I really don't think it's a you problem
Mario Carneiro (May 16 2024 at 15:07):
This sounds like @Kim Morrison 's jurisdiction. There is a bot that will let you apply awaiting-review
/awaiting-author
labels, and I think there is also a bot that adds/removes merge-conflict
. I have not much knowledge about this aspect of the CI setup for mathlib so I am leaving this to others
Mario Carneiro (May 16 2024 at 15:08):
But Shreyas is correct that Batteries doesn't completely copy Mathlib's github setup specifically because of the permissions issue
Mario Carneiro (May 16 2024 at 15:10):
In the absence of automation, you can either poke someone with triage perms on Batteries to add the label, or live without it
Shreyas Srinivas (May 16 2024 at 15:12):
Currently batteries#793 has 10 lines of Array code in Array.Lemmas. This code (two tiny size theorems basically) was provided by @François G. Dorais during the review process.
Shreyas Srinivas (May 16 2024 at 15:14):
I hope this is not a big barrier for getting this PR through.
Shreyas Srinivas (May 16 2024 at 15:30):
I have added a separate PR for these two lemmas batteries#801
But in order to maintain buildability I haven't removed them from 793 yet. After batteries#801 is merged, I will do a git pull and merge.
The PRs need to be accepted in the following order : batteries#801 -> batteries#793 -> batteries #800
Reviews welcome.
Mario Carneiro (May 16 2024 at 15:35):
when you have dependent PRs, the dependent should be an end-extension of the dependency, i.e. it should contain all the commits and then some more at the end. If you take the middle commits out then it will generally break the dependent PR, so don't do that. (If it doesn't break the dependent PR then they are not actually dependent and should be separate PRs.)
Shreyas Srinivas (May 16 2024 at 15:37):
I think I fixed that issue between 793 and 800. 793's commit history should be a prefix of 800
Mario Carneiro (May 16 2024 at 15:39):
793 has a ton of commits though, could you squash it if it's not serving any review purpose?
Shreyas Srinivas (May 16 2024 at 15:40):
Okay. Afaik, all comments were addressed though one or two haven't been marked resolved officially
Shreyas Srinivas (May 16 2024 at 15:59):
Mario Carneiro said:
793 has a ton of commits though, could you squash it if it's not serving any review purpose?
Squashed it all into one commit.
Shreyas Srinivas (May 16 2024 at 16:06):
Both batteries#793 and batteries#800
Shreyas Srinivas (May 16 2024 at 16:18):
@Mario Carneiro I have rebased 793 (and consequently 800) onto 801. batteries#801 adds a couple of Array lemmas that batteries#793 (and consequently batteries#800) use
Shreyas Srinivas (May 16 2024 at 16:18):
This means the PRs should now have no problem getting merged correctly
Shreyas Srinivas (May 16 2024 at 16:20):
I need someone to run the CI on 793 but considering the CI automatically and succesfully runs on 800 which is top of this foodchain, I think the CI won't complain about 793 either
François G. Dorais (May 16 2024 at 16:26):
Since Shreyas is not a git pro, let's try to give very clear instructions. 793 is a bit of a mess now.
Shreyas Srinivas (May 16 2024 at 16:30):
I think I might have fixed it all. I will make a separate PR for improving the unexpander doc
Shreyas Srinivas (May 16 2024 at 16:31):
Currently, adding anything new to 793 might cause issues with 800
Shreyas Srinivas (May 16 2024 at 16:31):
For reference the docstring you proposed @François G. Dorais was
The definitions `mkVector*` provide convenient ways to write terms for short vectors, especially in metaprogramming. Unexpanders are provided make these shortcuts invisible to users.
Shreyas Srinivas (May 16 2024 at 16:43):
All 3 PRs are green
Shreyas Srinivas (May 16 2024 at 17:14):
Update : 3 PRs ready for review
Shreyas Srinivas (May 16 2024 at 17:42):
(deleted)
Shreyas Srinivas (May 17 2024 at 16:50):
Hi all, batteries#803 is independent of the other PRs and ready for review
Shreyas Srinivas (May 17 2024 at 16:58):
batteries#801 can also be reviewed separately
Shreyas Srinivas (May 18 2024 at 10:39):
Hi, I need a bit of help: I am porting adding the remaining mostly monadic defs from arrays to vectors in a soon-to-be new PR when I came across this function:
@[inline] def getEvenElems (as : Array α) : Array α :=
(·.2) <| as.foldl (init := (true, Array.empty)) fun (even, r) a =>
if even then
(false, r.push a)
else
(true, r)
This seems an oddly specific function to have as opposed to, say, a general filterIdx
function which carries the array index and the accumulator, instead of the boolean flag.
Is there a reason for doing things this way?
Shreyas Srinivas (May 18 2024 at 10:41):
Note: I am suggesting filterIdx
as opposed to filter
because a filter
function already exists and doesn't carry the array index paired with the accumulator
Frédéric Dupuis (May 18 2024 at 10:41):
Indeed a filterIdx
function would be much more useful.
Shreyas Srinivas (May 21 2024 at 12:55):
Shreyas Srinivas said:
Update : 3 PRs ready for review
Review request:
I am a bit nervous to chain more PRs on top of these. I would like to request reviews of these PRs so that they get merged before I add anymore. Not quite sure whom to ping.
Yuyang Zhao (May 21 2024 at 17:38):
Would it be possible later to change the runtime or have another type to not store size and capacity to make it more lightweight? I think most of the time we do not need to change its size, otherwise we should use Array
?
Shreyas Srinivas (May 21 2024 at 17:53):
Vector is an Array with a static size parameter
Shreyas Srinivas (May 21 2024 at 17:54):
The runtime already supports dynamic arrays with some efficient C code
Henrik Böving (May 21 2024 at 17:57):
I think @Yuyang Zhao understands this. What they are asking for is the following: The way which your type currently looks:
/-- `Vector α n` is an `Array α` whose size is statically fixed to `n` -/
structure Vector (α : Type u) (n : Nat) where
/--Internally, a vector is stored as an array for fast access-/
toArray : Array α
/--`size_eq` fixes the size of `toArray` statically-/
size_eq : toArray.size = n
deriving Repr, BEq, DecidableEq
Means that the Array
will store its size and its capacity in a lean_object
+ you additionally have to carry around the n
parameter even though it is already present inside of the Array
. The idea would be to have a specialized lean_object
type that is basically represented in a way similar to yours in the logic but internally doesn't carry around a capacity and also doesn't carry around its current size twice. That would require changes to the runtime which in turn requires coordination with the FRO.
Shreyas Srinivas (May 21 2024 at 17:58):
Oh I see what you mean
Shreyas Srinivas (May 21 2024 at 17:58):
Apologies
Shreyas Srinivas (May 21 2024 at 17:59):
This would also require changes to the Vector API
Shreyas Srinivas (May 21 2024 at 17:59):
Currently there are a lot of functions which alter the size of the vector
Frédéric Dupuis (May 21 2024 at 18:02):
I'm not an expert on the Lean compiler, but my guess is that the performance gain would be minuscule compared to the effort it would require.
Henrik Böving (May 21 2024 at 18:08):
I think that's right. The memory footprint is only increased noticeably for very small arrays and the better cache should be miniscule enough for this not to matter.
Shreyas Srinivas (May 21 2024 at 18:13):
To add foolproof (i.e. non panicking) API functions, the Vector structure would effectively have to carry a parameter that serves the same role as capacity does for array and limit modifications that make vector size larger than this capacity, or use some null terminator like character whose position would have to be tracked. So the API would become painful.
Shreyas Srinivas (May 21 2024 at 18:13):
So proofs would get hideous
Shreyas Srinivas (May 21 2024 at 18:14):
Also, I don't recall the specific optimisations exactly, but C compilers can heavily optimise malloc calls, and pointer fetches.
Shreyas Srinivas (May 21 2024 at 18:23):
I see no point in trying to outdo GCC
Henrik Böving (May 21 2024 at 18:26):
Lean's output gets compiled with clang btw.
Shreyas Srinivas (May 21 2024 at 18:26):
Ah so we do have a way to get a wasm backend
Shreyas Srinivas (May 21 2024 at 18:27):
But my point applies to clang too.
Henrik Böving (May 21 2024 at 18:32):
Shreyas Srinivas said:
Also, I don't recall the specific optimisations exactly, but C compilers can heavily optimise malloc calls, and pointer fetches.
I mean I don't really see how this applies here? Currently you have this layout for Array: https://github.com/leanprover/lean4/blob/master/src/include/lean/lean.h#L155C1-L160C21 and then there is an additional (tagged pointer) integer floating around somewhere as the dimension parameter that's externalized in your structure. There is no way that a C compiler could figure this out at the precision that basically declaring a builtin type:
/-- `Vector α n` is an `Array α` whose size is statically fixed to `n` -/
structure Vector (α : Type u) (n : Nat) where
/--Internally, a vector is stored as an array for fast access-/
toArray : Array α
/--`size_eq` fixes the size of `toArray` statically-/
size_eq : toArray.size = n
deriving Repr, BEq, DecidableEq
with a memory representation of:
typedef struct {
lean_object m_header;
size_t m_size;
lean_object * m_data[0];
} lean_vector_object;
could have. It's just that doing this optimization is almost meaningless in the first place.
Yuyang Zhao (May 21 2024 at 18:44):
I'm not sure, because I'll be using small Vector
s a lot. But I can just trust you : )
Mario Carneiro (May 21 2024 at 18:51):
Yuyang Zhao said:
Would it be possible later to change the runtime or have another type to not store size and capacity to make it more lightweight? I think most of the time we do not need to change its size, otherwise we should use
Array
?
I thought about this, and my conclusion was that it's not possible, because the garbage collector (which has no type information and can only see the graph structure of lean_object
s because they all have enough tagging to differentiate) needs to know both how large the original allocation is and how many of the cells are initialized to clean up properly after it.
Shreyas Srinivas (May 21 2024 at 20:06):
Henrik : I wasn't looking at the implementation specifics of lean. I see your point.
Shreyas Srinivas (May 21 2024 at 20:07):
Shreyas Srinivas said:
Shreyas Srinivas said:
Update : 3 PRs ready for review
Review request:
I am a bit nervous to chain more PRs on top of these. I would like to request reviews of these PRs so that they get merged before I add anymore. Not quite sure whom to ping.
Making sure this doesn't message doesn't get lost in the discussion.
Kim Morrison (May 21 2024 at 23:02):
Shreyas Srinivas said:
Review request:
I am a bit nervous to chain more PRs on top of these. I would like to request reviews of these PRs so that they get merged before I add anymore. Not quite sure whom to ping.Making sure this doesn't message doesn't get lost in the discussion.
Currently only @Mario Carneiro and I have merge rights on Batteries, and both of us seem to be pretty busy. This bottleneck needs an solution soon, and we are working on it!
Further reviews by others are super helpful, in any case (whether asking for more changes or giving approval).
François G. Dorais (May 22 2024 at 13:58):
The smaller related PRs batteries#801 and batteries#803 each have two independent approvals. They are ready to merge.
James Gallicchio (May 22 2024 at 21:28):
Mario Carneiro|110049 said:
I thought about this, and my conclusion was that it's not possible, because the garbage collector (which has no type information and can only see the graph structure of
lean_object
s because they all have enough tagging to differentiate) needs to know both how large the original allocation is and how many of the cells are initialized to clean up properly after it.
(until we reach the promised land of linear type annotations, at which point it should be possible to have static-length arrays with the length not stored in the object)
Mario Carneiro (May 22 2024 at 21:29):
no, I'm saying that's not sufficient without the promised-er land of a compiler that actually makes use of typing information for memory layout and with static garbage collection a la rust
Mario Carneiro (May 22 2024 at 21:31):
An array without length information is similar to an unboxed tuple, it can't be passed to anything type-erasing without overhead
Kim Morrison (May 23 2024 at 01:11):
François G. Dorais said:
The smaller related PRs batteries#801 and batteries#803 each have two independent approvals. They are ready to merge.
I've hit merge on both of these, without looking. :-(
François G. Dorais (May 23 2024 at 01:19):
Thanks Kim!
@Shreyas Srinivas: you can now git merge main
into the batteries#793 PR to update.
Shreyas Srinivas (May 23 2024 at 09:31):
Done. PR batteries#793 passes CI and has one approval. Waiting for more reviews or a merge. Now the CI runs automatically. As I suspected, first time PRs need maintainers approval for running CI (Francois did confirm this)
Shreyas Srinivas (May 23 2024 at 16:29):
The PR is again all-green. Francois golfed the proofs.
Shreyas Srinivas (May 24 2024 at 07:41):
Apologies for messaging again. I am looking for approvals/merge/more reviews on batteries#793
There is already one approving review. The sooner this PR gets through the sooner we can have static vectors in batteries.
Eric Wieser (May 24 2024 at 22:32):
Reiterating my comment elsewhere, I don't think this PR should be merged until an adaptation PR is passing CI in mathlib (as otherwise mathlib gets stuck for a while and can't pick up other batteries releases)
Shreyas Srinivas (May 24 2024 at 22:54):
Okay. I can do this ~24 hours from now
Shreyas Srinivas (May 24 2024 at 22:57):
The simplest companion PR puts Mathlib's Vector in a namespace and calls the namespaced version in the two places it is used.
Shreyas Srinivas (May 24 2024 at 22:57):
Once this is merged, batteries#793 should be safe to merge.
Shreyas Srinivas (May 24 2024 at 22:59):
Once batteries Vector is available, Mathlib's Vector can be transitioned to this API and the namespace removed. As I see it, this is the smoothest path, involving the least tinkering I can think of.
Shreyas Srinivas (May 24 2024 at 23:03):
It also gives us great flexibility to work with the Mathlib Vector API
CC: @Mario Carneiro , @Kim Morrison and @Eric Wieser to avoid discussing across three threads
Mario Carneiro (May 24 2024 at 23:03):
There is no need to ping this PR every day. We're aware of it
Shreyas Srinivas (May 24 2024 at 23:04):
No this is about how to manage this and Mathlib's Vector. The discussion is spread across three threads
Shreyas Srinivas (May 24 2024 at 23:05):
Since Eric mentioned the companion PR as a requirement to merging this PR, I thought this is the best place to decide on what is to be done.
Mario Carneiro (May 24 2024 at 23:05):
the mathlib PR should probably be discussed on #mathlib4
Shreyas Srinivas (May 24 2024 at 23:06):
My proposed plan involves both this PR and the Mathlib PR
Mario Carneiro (May 24 2024 at 23:06):
but I agree that having a plan to not break mathlib is a prerequisite for landing this PR
Shreyas Srinivas (May 24 2024 at 23:07):
Shreyas Srinivas said:
The simplest companion PR puts Mathlib's Vector in a namespace and calls the namespaced version in the two places it is used.
Specifically the plan starts with this message
Mario Carneiro (May 24 2024 at 23:07):
OK, sgtm
Mario Carneiro (May 24 2024 at 23:08):
that can probably be landed in advance anyway
Shreyas Srinivas (May 24 2024 at 23:09):
Yes. This plan works without creating dependencies.
François G. Dorais (May 25 2024 at 03:42):
Kim Morrison said:
There is general enthusiasm in the FRO for working on this as
Batteries.Vector
rather than_root_.Vector
.
If namespaced in Batteries, as Kim suggested, then there are no downstream issues at all. That would also make migration simpler if core puts a new Vector type in the root namespace.
Shreyas Srinivas (May 25 2024 at 07:01):
If the FRO doesn't want _root_.Vector, then Mathlib's Vector would also need to go into a namespace
Eric Wieser (May 25 2024 at 07:34):
Shreyas Srinivas said:
I thought this is the best place to decide on what is to be done.
Surely your previous thread in #mathlib 4 is the right place? Please try not to restart discussions in different channels.
Eric Wieser (May 25 2024 at 07:35):
(maybe this is my fault for not linking that thread in my message above)
Shreyas Srinivas (Jun 05 2024 at 11:50):
Hi, it has been over a week. I just want to check in and ask if I can get this PR reviewed or merged. It already has an approving review. The mathlib PR to test this was approved last week. I would like to use this API soon if I can. I apologise if this is too soon to send a reminder.
Shreyas Srinivas (Jul 09 2024 at 17:04):
I tried upgrading this batteries#793 to 4.10.0-rc1. When doing so, I ran into a surprising issue. Previously, in the Lemmas.lean file:
@[simp]
theorem map_empty (f : α → β) : map f empty = empty :=
rfl
The above just worked. Now rfl
doesn't work. I guess this is due to some changes in the way either rfl
or reducibility works by default, but I can't pinpoint the cause or the fix.
Shreyas Srinivas (Jul 09 2024 at 17:08):
PS: I also noticed there is no default gitpod config for batteries and no link shows up on PRs. I have been using gitpod for mathlib recently and now I wonder if it makes sense to add something for batteries.
Jannis Limperg (Jul 09 2024 at 17:16):
Is map
defined by well-founded recursion? If so, it's now irreducible by default.
Shreyas Srinivas (Jul 09 2024 at 17:16):
I tried unseal map in..
Shreyas Srinivas (Jul 09 2024 at 17:17):
The result :
failed to set `[local semireducible]`, `Batteries.Vector.map` is currently `[semireducible]`, `[irreducible]` expected
use `set_option allowUnsafeReducibility true` to override reducibility status validation
Kim Morrison (Jul 09 2024 at 17:18):
So it is something further down the definitional chain that has become irreducible.
Shreyas Srinivas (Jul 09 2024 at 17:20):
Is there a way to find out which one?
Shreyas Srinivas (Jul 09 2024 at 17:21):
Some debug trace for rfl?
Henrik Böving (Jul 09 2024 at 17:22):
#reduce map f empty
with a sufficient binder for f
Shreyas Srinivas (Jul 09 2024 at 17:23):
I think it is Array.mapM.map
Henrik Böving (Jul 09 2024 at 17:23):
Can you prove it.
Shreyas Srinivas (Jul 09 2024 at 17:25):
No, but digging down, it is the closest well founded recursive function
Henrik Böving (Jul 09 2024 at 17:25):
Well just run reduce and see where you get stuck?
Henrik Böving (Jul 09 2024 at 17:26):
Oh maybe reduce uses kernel reduction which ignores this actually :/
Henrik Böving (Jul 09 2024 at 17:27):
Well either way if Array.map is at fault you should be able to just unfold your map
and use a simp call to verify your theorem, that seems better than relying on some defeq anyways
Shreyas Srinivas (Jul 09 2024 at 17:29):
So unseal Array.mapM.map
worked.
Shreyas Srinivas (Jul 09 2024 at 17:30):
But a funny thing is unseal
had to be put in before the docstring.
Shreyas Srinivas (Jul 09 2024 at 17:30):
It was unintuitive at first.
Shreyas Srinivas (Jul 09 2024 at 17:31):
It was a syntax error anywhere else. EDIT : I would think putting unseal just under the @[simp]
line and above the def
line was the correct syntax.
Henrik Böving (Jul 09 2024 at 17:31):
Using the unseal here seems like a pretty bad idea to me, you are relying on some hyperspecifics of the way that your underlying API is implemented. Why not unfold + simp instead.
Shreyas Srinivas (Jul 09 2024 at 17:32):
You get stuck in a loop. simp [map, empty, Array.map, Array.mapM, Array.mapM.map]
gives a time out.
Shreyas Srinivas (Jul 09 2024 at 17:33):
I think there's also an Id.run
somewhere in there.
Henrik Böving (Jul 09 2024 at 17:33):
But you aren't supposed to unfold Array.map and mapM with your loop, there should be a theorem somewhere that claims mapping over the empty array yields the empty array
Shreyas Srinivas (Jul 09 2024 at 17:33):
There isn't one.
Shreyas Srinivas (Jul 09 2024 at 17:34):
At least not now.
Henrik Böving (Jul 09 2024 at 17:34):
Well then that might be a good contribution to make instead of comitting to unstable proof schemes ^^
Shreyas Srinivas (Jul 09 2024 at 17:34):
I do recall getting the theorem name from Array but a quick loogle search for the same name turned up empty.
Shreyas Srinivas (Jul 09 2024 at 17:36):
So if it was indeed removed, I am not sure adding it is wise on my part.
Shreyas Srinivas (Jul 09 2024 at 17:41):
On second thoughts, I might have obtained it from Mathlib Vectors.
Shreyas Srinivas (Jul 09 2024 at 17:46):
I need to commute now. I'll make a PR once I am home.
Shreyas Srinivas (Jul 09 2024 at 18:05):
OTOH, I did want to ask again if the intended syntax for unseal
is to put it before docstrings. For large docstrings, this might make it hard to keep track this line
François G. Dorais (Jul 09 2024 at 19:22):
I don't think simp
is the right way to deal with broken rfl
proofs due to irreducible defs. @Joachim Breitner and I briefly discussed this in batteries#832 but I think we both went on to more pressing matters and never looked back.
For the moment, I tend to prefer with_unfolding_all rfl
even though that throws a big huge rug over the problem. But simp
is another big rug, which is admittedly easier to trim to size, but it's also not guaranteed to work at all as a replacement.
Joachim Breitner (Jul 09 2024 at 19:54):
I’d say unseal
is better than with_unfolding_all
, but would say simp
or rw
is even more proper – they use the .eq_*
equational theorems that are the public API for a function defined by well-founded recursion, whereas rfl
considers the internal construction.
Joachim Breitner (Jul 09 2024 at 19:55):
Shreyas Srinivas said:
OTOH, I did want to ask again if the intended syntax for
unseal
is to put it before docstrings. For large docstrings, this might make it hard to keep track this line
unseal … in
work like set_option … in
and precedes a full “command”. The docstring is part of the def
command syntax. So while unintuitive, it is justifiable.
Maybe there ought to be unseal … in <tactic>
.
Shreyas Srinivas (Jul 09 2024 at 19:58):
Joachim Breitner said:
I’d say
unseal
is better thanwith_unfolding_all
, but would saysimp
orrw
is even more proper – they use the.eq_*
equational theorems that are the public API for a function defined by well-founded recursion, whereasrfl
considers the internal construction.
So is it justified to use rfl for an Array.map_empty
API function?
François G. Dorais (Jul 09 2024 at 21:11):
Joachim Breitner said:
I’d say
unseal
is better thanwith_unfolding_all
, but would saysimp
orrw
is even more proper – they use the.eq_*
equational theorems that are the public API for a function defined by well-founded recursion, whereasrfl
considers the internal construction.
I agree that unseal
is better than with_unfolding_all
because it's a localized fix rather than a big rug. But it relies on knowing exactly what def to unseal, which is not trivial and we have no automated tools for.
My issues with simp are that (a) it does a whole lot more than just applying .eq_*
lemmas, and (b) it works bottom-up instead of top-down.
I still think we need a tactic that is specialized for dealing with this general issue.
Joachim Breitner (Jul 09 2024 at 21:34):
When I say simp
I am inclined to mean something like simp only […]
. With that I am not sure I see the problem. But you can also use rw
if simp
seems too poweful.
In this particular case I think the problem is that Array.map
itself has an imcomplete API, it seems, and there should be an Array.map_empty
that properly hides the construction of Array.map
, and then Vector.map
would be an thin wrapper around Array.map_empty
.
Shreyas Srinivas (Jul 09 2024 at 21:55):
François G. Dorais said:
I still think we need a tactic that is specialized for dealing with this general issue.
I guess what's needed is to set reducibility at a per tactic level or tactic variants with different reducibility settings.
Shreyas Srinivas (Jul 09 2024 at 21:55):
Like with_reducible rfl
Shreyas Srinivas (Jul 09 2024 at 23:13):
Is there a consensus on adding Array.map_empty
? Is it okay to use unseal Array.mapM
and rfl
there?
Shreyas Srinivas (Jul 09 2024 at 23:28):
I'll be offline for 12 hours or more. If a consensus is reached and anyone wishes to PR it in the meantime, I welcome it.
François G. Dorais (Jul 09 2024 at 23:30):
Joachim Breitner said:
When I say
simp
I am inclined to mean something likesimp only […]
. With that I am not sure I see the problem. But you can also userw
ifsimp
seems too poweful.
Yes, this is what I meant when I said that simp is "easier to trim to size". But this is still not a good solution, simp is easy to run into loops and mayhem, whereas (with_unfolding_all
or not) rfl
stops as soon as unification fails and can never loop.
We do have tools to trim down simps, but simp needs to work first!
Joachim Breitner said:
In this particular case I think the problem is that
Array.map
itself has an imcomplete API, it seems, and there should be anArray.map_empty
that properly hides the construction ofArray.map
, and thenVector.map
would be an thin wrapper aroundArray.map_empty
.
That is true and it should be fixed. But you need to be able to identify the problem first! The sudden failure of rfl
in this case doesn't identify the problem at all and it's hard to dig at it.
François G. Dorais (Jul 10 2024 at 02:27):
Is it time to revive unification hints? They're already implemented but not very much used.
Yaël Dillies (Jul 10 2024 at 03:44):
François G. Dorais said:
Joachim Breitner said:
I’d say
unseal
is better thanwith_unfolding_all
, but would saysimp
orrw
is even more proper – they use the.eq_*
equational theorems that are the public API for a function defined by well-founded recursion, whereasrfl
considers the internal construction.My issues with simp are that (a) it does a whole lot more than just applying
.eq_*
lemmas, and (b) it works bottom-up instead of top-down.
It can be made to work top-down inside an expression?
Shreyas Srinivas (Jul 10 2024 at 11:17):
batteries#877 is rather short
Shreyas Srinivas (Jul 10 2024 at 11:20):
Hopefully it can be merged soon
Shreyas Srinivas (Jul 10 2024 at 12:19):
To clarify, batteries#877 adds the array API lemma map_empty
. It is ready for review (can't add labels)
Shreyas Srinivas (Jul 19 2024 at 21:37):
Update : This issue has been properly fixed. batteries#793 is again ready for review/merge. Vector.map_empty uses the corresponding Array.map_empty theorem.
Last updated: May 02 2025 at 03:31 UTC