Zulip Chat Archive

Stream: lean4

Topic: Computed Fields feature


Leonardo de Moura (Jul 11 2022 at 22:43):

We have merged the "Computed Fields" PR. It is an amazing new feature. See release notes for additional details: https://github.com/leanprover/lean4/blob/master/RELEASES.md
The following Lean types are using this feature: Lean.Expr, Lean.Level, and Lean.Name. So, it is quite disruptive for meta-programmers that use these types. Please update your projects, and thanks for your patience.

Leonardo de Moura (Jul 11 2022 at 22:44):

Many thanks to @Gabriel Ebner for developing this feature!

Kevin Mullins (Jul 11 2022 at 22:45):

The notation a[i] is not defined as follows

Is this a typo for now?

Patrick Massot (Jul 11 2022 at 22:45):

In the release notes, I guess that

The notation a[i] is not defined as follows

is meant to be

The notation a[i] is now defined as follows

Leonardo de Moura (Jul 11 2022 at 22:46):

Thanks!

Leonardo de Moura (Jul 11 2022 at 22:46):

I will fix it.

Jason Rute (Jul 11 2022 at 23:26):

Nice! I used this sort of thing a lot in Scala case classes (which are like inductive types). In Scala could define say Expr.hash in three different ways: (1) as a method which was recomputed every time, (2) as a value which was computed on creation of the object, (3) as a lazy value which was computed and cached only the first time it was called which was good for some expensive calculations. All three were called the same way e.hash which made refactoring easy.

Tomas Skrivan (Jul 12 2022 at 00:26):

What are the common use cases for this feature?

Mario Carneiro (Jul 12 2022 at 00:34):

Any kind of O(1) precomputed information which is structurally recursive on the term, like the size of a subtree

Mario Carneiro (Jul 12 2022 at 00:35):

The obvious immediate application is the Data field of Expr, Name, Level which was an ergonomic wart compared to the lean 3 implementation of these types (which cheated by having a custom implementation which stored the computed field where lean couldn't reach it)

Mario Carneiro (Jul 12 2022 at 00:37):

This technique of "augmented trees" is very common in data structures

Mario Carneiro (Jul 12 2022 at 00:39):

Here's a whole lecture on it: https://www.youtube.com/watch?v=xVka6z1hu-I

Wojciech Nawrocki (Jul 12 2022 at 00:55):

Very cool! Is the reason for taking BinderInfo out of Data basically that BinderInfo is "actual" data rather than a pure function of the rest of the Expr?

Mario Carneiro (Jul 12 2022 at 00:58):

yes, this doesn't work if it's not a pure function

Mario Carneiro (Jul 12 2022 at 01:00):

it would be cool if we could bit-pack those together but that sounds difficult given lean's uniform ABI

Wojciech Nawrocki (Jul 12 2022 at 01:02):

Mario Carneiro said:

it would be cool if we could bit-pack those together but that sounds difficult given lean's uniform ABI

Yeah, there is now a new BinderInfo field in Expr.lam/forallE with (I am guessing) some small memory overhead.

Wojciech Nawrocki (Jul 12 2022 at 01:03):

An interesting side effect of the change is that letE no longer has BinderInfo. Not that this made much sense, so maybe the side effect is really a bug fix. However there is a mysterious Bool field in | letE : Name → Expr → Expr → Expr → Bool → Expr -- is that the nonDepLet flag?

Mario Carneiro (Jul 12 2022 at 01:04):

that sounds like it should be a computed field

Wojciech Nawrocki (Jul 12 2022 at 01:05):

Oh okay, it is. In the data computation we have

    | .letE _x t v b nonDep =>
     ...
      mkDataForLet ... nonDep

which is both stored and computed.

Mario Carneiro (Jul 12 2022 at 01:07):

It is an input in mkLet, so it seems like it's not actually a computed field. Not sure what the story is there

Mario Carneiro (Jul 12 2022 at 01:08):

I mean, the data in Expr.data is not sufficient to determine whether a given variable in fact appears in the body, so even though it is a pure function of the expr inputs it would be computationally expensive to determine it

Wojciech Nawrocki (Jul 12 2022 at 01:09):

I am assuming: previously, Data was a mix of stored and computed fields; now, we have an actual computed field; but, for backwards compatibility, we still expose the stored parts (binder infos and the non-dep let flag) within Data.

Gabriel Ebner (Jul 12 2022 at 06:44):

Wojciech Nawrocki said:

An interesting side effect of the change is that letE no longer has BinderInfo. Not that this made much sense, so maybe the side effect is really a bug fix. However there is a mysterious Bool field in | letE : Name → Expr → Expr → Expr → Bool → Expr -- is that the nonDepLet flag?

The nonDepInfo flag is purely a placeholder for a future possible optimization in the compiler. It doesn't have any effect at the moment, but it is very much a flag and cannot be (purely) computed from the expression since it depends on type checking. The idea is that nonDepLet := true flags let-expressions which do not require ζ-reduction, that is, which could be replaced by a let_fun:

let (nonDep := false) X : Type := Nat; (id Nat.zero : X) -- type checks
let (nonDep := true)  X : Type := Nat; (id Nat.zero : X) -- doesn't type check

(Purely hypothetical, of course. To reiterate, the flag is completely unused at the moment.)

Gabriel Ebner (Jul 12 2022 at 06:51):

Wojciech Nawrocki said:

I am assuming: previously, Data was a mix of stored and computed fields; now, we have an actual computed field; but, for backwards compatibility, we still expose the stored parts (binder infos and the non-dep let flag) within Data.

I tried to keep the diff minimal for the Expr-refactoring. There are indeed a lot of parts that can be cleaned up or changed now (without affecting the API and without big refactorings, which is the whole point of the feature). For example the nonDepLet and binderInfo bits in Data are indeed vestigial and can be removed. With the extra 4 bits we could support 1048575 bound variables instead of 65535 (one can dream). We could also split the hash, the bitmask, the approximate depth, and the loose variable bound into different fields. (Maybe allowing four billion bound variables. :heart_eyes:)

Gabriel Ebner (Jul 12 2022 at 06:56):

it would be cool if we could bit-pack those together but that sounds difficult given lean's uniform ABI

There's nothing in Lean's ABI that prevents bit-packing of fields in a constructor (uniformity only means we can't pack Bool × Bool etc.)

Tomas Skrivan (Jul 12 2022 at 08:36):

If I have a List Nat and I compute length with this new feature, am I doubling memory consumption? I.e. does every tail store its length?

Gabriel Ebner (Jul 12 2022 at 08:42):

It would indeed store the length in every cons cell, but that's only a 33% increase per cons cell (which is 24 bytes at the moment if I'm not mistaken). The impact on the total memory consumption is less since not everything is a cons cell.

James Gallicchio (Jul 19 2022 at 00:24):

Is the definition of the function also exposed for proof purposes? I was doing some backflips before to cache these values -- super useful feature :)

Sebastian Ullrich (Jul 19 2022 at 08:10):

It's just a regular function on the datatype from the kernel's PoV

Chris Bailey (Jul 19 2022 at 19:16):

Can this be used with structures and/or classes, or am I thinking about this incorrectly? I think the other alternative is to make it a def and use the caching offered by Thunk?

For instance replacing sum and h_sum with a computed field:

structure X :=
  (a : Nat)
  (b : Nat)
  (sum : Nat)
  (h_sum : sum = a + b)

Gabriel Ebner (Jul 19 2022 at 19:17):

Note that Thunk is different: Thunk is lazy while the computed fields for inductives are strict.

Gabriel Ebner (Jul 19 2022 at 19:23):

The reason computed fields are not implemented (yet) for structures is twofold: 1) structures are a bit more work since you need to override the projections as well, and 2) it's much less urgent for structures since you can simulate most of the feature:

def Cached {α : Type u} (a : α) := { cached : α // cached = a }

instance : EmptyCollection (Cached a) where emptyCollection := a, rfl
instance : Inhabited (Cached a) where default := {}
instance : Subsingleton (Cached a) where
  allEq := fun b, hb c, hc => by subst hb; subst hc; rfl
instance : DecidableEq (Cached a) := fun _ _ => isTrue (Subsingleton.allEq ..)
instance : Hashable (Cached a) where hash _ := 0

structure A where
  a : Nat
  hash : Cached (Hashable.hash a) := {}
  deriving DecidableEq, Hashable


@[simp] theorem eq_of_subsingleton [Subsingleton α] {a b : α} : (a = b) = True := by
  simp [Subsingleton.allEq a b]

theorem A.extIff {a b : A} : a = b  a.a = b.a :=
  fun h => by subst h; rfl, by cases a; cases b; intro h; cases h; simp

The only downside of this simulation is that you get extra fields. But you don't get any exotic terms where the extra fields have the wrong value.

cognivore (Sep 26 2022 at 13:57):

(Deleted unreproducible bug report)

cognivore (Sep 26 2022 at 13:58):

(Deleted irrelevant version details)

cognivore (Sep 26 2022 at 13:59):

It's interesting that computedField doesn't seem to be used neither in lean4 nor in mathlib. Is there a reason? Am I just bad at grepping?

Sebastian Ullrich (Sep 26 2022 at 14:45):

rg '\[computedField' src
src/Lean/Level.lean
97:  @[computedField] data : Level  Data

src/Lean/Expr.lean
469:  @[computedField, extern c inline "lean_ctor_get_uint64(#1, lean_ctor_num_objs(#1)*sizeof(void*))"]

src/Init/Prelude.lean
3299:  @[computedField] hash : Name  UInt64

src/Lean/Elab/ComputedFields.lean
38:      throwError "The @[computedField] attribute can only be used in the with-block of an inductive"
202:        logError m!"'{computedFieldName}' must be tagged with @[computedField]"

cognivore (Sep 26 2022 at 14:49):

Funny. Sorry, I don't think I had my terminals straight today.


Last updated: Dec 20 2023 at 11:08 UTC