Zulip Chat Archive

Stream: lean4

Topic: Understanding the docstring for docs#Array.data

Eric Wieser (Oct 26 2023 at 14:45):

I'm confused by docs#Array.data and how it relates to docs#Array.toList; the former is documented as:

structure Array (α : Type u) where
  /-- Convert a `List α` into an `Array α`. This function is overridden
  to `List.toArray` and is O(n) in the length of the list. -/
  mk ::
  /-- Convert an `Array α` into a `List α`. This function is overridden
  to `Array.toList` and is O(n) in the length of the list. -/
  data : List α

Does "X is overridden to Y" mean "X is implemented_by Y"? Does it mean "X is an implementation detail, use Y instead"?

Alex J. Best (Oct 26 2023 at 14:51):

It looks like it means the line below that in the source attribute [extern "lean_array_data"] Array.data which is similar in meaning to implemented_by yes

Joachim Breitner (Oct 26 2023 at 14:59):

And that C function is simply calling Array.toList in Lean land. This detour is I think needed implemented_by would insis on equal types. But otherwise it's the same. What's a better wording? “This projection is implemented by Array.toList” maybe?

Alex J. Best (Oct 26 2023 at 15:09):

How do the types differ here?

Joachim Breitner (Oct 26 2023 at 15:12):

Hmm, nevermind, I confused it with something else I tried with implemented_by.

Joachim Breitner (Oct 26 2023 at 15:15):

So the C code merely translates between the different ABIs for extern and export:
https://github.com/leanprover/lean4/blob/6c5f79c0df01e9cd74ac4ee411ec392a5ac1b5d3/src/runtime/object.cpp#L346
(Why is it even different?)

So why isn't it using implemented_by? Because it'd an unsupported forward reference?

Eric Wieser (Oct 26 2023 at 15:25):

Joachim Breitner said:

What's a better wording? “This projection is implemented by Array.toList” maybe?

Perhaps "implemented at runtime". Also there should be a docstring on Array.toList that says "do not use this function, it is an implementation detail; use Array.data instead"

Joachim Breitner (Oct 26 2023 at 15:33):

Or the other way around (I think using Array.toList, and treating Array as an opaque type, is more natural, but not sure.)

Eric Wieser (Oct 26 2023 at 15:35):

Indeed; I guess my point is that the documentation makes it very hard to tell which pair of functions is the implementation detail! (it doesn't help that I would expect "overridden by" not "overridden to"...)

Joachim Breitner (Oct 26 2023 at 15:37):

I think the important message here is that .mk and .data are not just constructor and projection, as you would expect, and have very different performance characteristics. I’ll draft a PR and let you refine it.

Eric Wieser (Oct 26 2023 at 15:41):

Would renaming .mk to ofList and data to toList be reasonable, and renaming the existing toList to private toListImpl or similar?

Joachim Breitner (Oct 26 2023 at 15:45):

Probably. But when doing proofs, where these really are just constructor and projection, it’s nice to have .mk around, and to have a nicer variable name in inductions? That’s the only reason I can think of right now why you might want to keep them separate.

Joachim Breitner (Oct 26 2023 at 15:47):

Can you have a look at lean4#2771, refine if needed, and express support otherwise? Then I’ll mark it ready for review.

Joachim Breitner (Oct 26 2023 at 15:48):

(I did not explicitly describe one as “internal only”)

Eric Wieser (Oct 26 2023 at 15:55):

I think this doesn't solve the fundamental question I'm trying to answer, which is "how do I pick between .mk and .ofList?"

Joachim Breitner (Oct 26 2023 at 16:12):

Hmm. My intuition is: Pick .ofList if you are treating Arrays as an opaque data structure (e.g. when programming). Pick .mk if you are treating it as a wrapper around Lists (e.g. when proving).
But I don’t know if this intuition is a good one, and if it is, how to concisely convey it.

Eric Wieser (Oct 26 2023 at 16:23):

I'd claim that you never want .ofList, because it equal to and has the same performance characteristics as .mk but is harder to prove things about

Joachim Breitner (Oct 26 2023 at 16:28):

But it’s much easier to find, if you have a List and want an Array, and don’t know about the internal definition of Array :-D

Eric Wieser (Oct 26 2023 at 16:29):

Well then mk has the wrong name!

Joachim Breitner (Oct 26 2023 at 16:29):

Right

Joachim Breitner (Oct 26 2023 at 16:30):

I guss I wouldn’t oppose a PR that renames .mk and .data and makes the other functions private.

Eric Wieser (Oct 26 2023 at 16:30):

Joachim Breitner said:

Probably. But when doing proofs, where these really are just constructor and projection, it’s nice to have .mk around

I don't really buy that; in a proof you'd probably just use the angle brackets anyway, and .ofList is hardly much worse to type

and to have a nicer variable name in inductions?

Doesn't induction pretty much force us to name the induction argument manually anyway? I agree toList is slightly weirder than data as the induction variable, but there's not much in it

Joachim Breitner (Oct 26 2023 at 16:31):

That’s true, the auto-picked name is rarely used in the end. I’m convinced.

Mac Malone (Oct 26 2023 at 18:08):

Joachim Breitner said:

So why isn't it using implemented_by? Because it'd an unsupported forward reference?

The forward reference concern can be elided by using e.g. attribute [implemented_by Array.toList] Array.data after the relevant function is defined. I think this may be an interesting question to ask @Sebastian Ullrich or Leo because it seems to me that, if possible, using implemented_by would be much clearer (as it would be deducible from the Lean level that one implements the other without needing to look at the C code).

Joachim Breitner (Oct 26 2023 at 18:33):

If you set thr attribute later, will it still affect the compilation of all definitions in between?

Mac Malone (Oct 26 2023 at 18:54):

@Joachim Breitner No, but the in-between definitions shouldn't be using Array.mk/data (in their compiled code) anyway. For Array.mk, this certainly true as List.toArray is defined a little ways down in the same file. Unfortunately, Array.toList is defined separately in Init.Data.Array.Basic, so it is harder to keep Array.data honest (even though I suspect it still is). Regardless, implemented_by simply does not work across files (understandably), so that separation does prevent it from being use for Array.data. Thus, now having relooked at it, supporting this file separation is probably (at least one reason) why the export/extern trick was used.

Mac Malone (Oct 26 2023 at 18:54):

As an aside, If it was possible to mark the structure constructors and projections noncomputable, one could have Lean force them not to be used in compiled code until they are provided with computable definitions via implemented_by.

Mario Carneiro (Oct 26 2023 at 21:53):

FWIW I agree with Eric here: data should be preferred, but it should be renamed to toList and toList should never be used directly (it can even be private, although it is nice to have the proof of toList = toListImpl somewhere to make sure this isn't us pulling a fast one).

Mac Malone (Oct 27 2023 at 02:33):

@Mario Carneiro I imagine it should be the reverse (toListImpl = toList) as a @[simp] lemma to make sure it disappears if it accidently used somewhere.

Mario Carneiro (Oct 27 2023 at 02:33):

If it's not accessible without hacks (or even if it only has a weird name), I'm not too worried about it being accidentally used

Mario Carneiro (Oct 27 2023 at 02:34):

but sure, that ordering sounds fine

Mario Carneiro (Oct 27 2023 at 02:34):

I haven't seen any issues with people calling List.mapTR by mistake

Mario Carneiro (Oct 27 2023 at 02:35):

Although those functions do show up in docgen and loogle; it might be nice to have some @[hidden] attribute to indicate that it isn't meant for general use

Joachim Breitner (Oct 27 2023 at 07:36):

Related to functions showing up in docs that shouldn’t, also see https://github.com/leanprover/lean4/issues/2719 about let/where functions showing up in docs and in doc linters.

Kim Morrison (Sep 10 2024 at 03:00):

A heads up that this change (renaming Array.data to Array.toList) is live at lean#5288, and will be merged soon. The fallout in Batteries is moderate but straightforward, and the fallout in Mathlib is minimal. There are deprecations in place for all renamed lemmas, so hopefully this will be straightforward for anyone downstream.

Last updated: May 02 2025 at 03:31 UTC