Zulip Chat Archive

Stream: batteries

Topic: Large vector hangs


Violeta Hernández (Sep 27 2024 at 22:25):

I'm running into this bizzare issue: a vector of length 32 can be defined almost instantly, but a vector of length 33 seemingly hangs indefinitely.

import Batteries.Data.Vector.Basic

-- instant
example : Batteries.Vector Nat 32 :=
  #v[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]

-- (deterministic) timeout at `whnf`, maximum number of heartbeats (200000) has been reached
example : Batteries.Vector Nat 33 :=
  #v[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]

Violeta Hernández (Sep 27 2024 at 22:31):

Batteries.Vector.mkVector 33 1 works just fine, so I suspect this is an issue with the #v macro

Violeta Hernández (Sep 27 2024 at 22:42):

Actually, this still reproduces the bug:

import Batteries.Data.Vector.Basic

-- instant
example : Batteries.Vector Nat 32 :=
  List.toArray [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1], rfl

-- (deterministic) timeout at `whnf`, maximum number of heartbeats (200000) has been reached
example : Batteries.Vector Nat 33 :=
  List.toArray [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1], rfl

Violeta Hernández (Sep 27 2024 at 22:43):

But replacing rfl by sorry gets the bug to disappear

Violeta Hernández (Sep 27 2024 at 22:44):

As does replacing rfl with by simp

Shreyas Srinivas (Sep 27 2024 at 22:50):

Are you able to get the same bug with arrays?

François G. Dorais (Sep 27 2024 at 22:51):

Very interesting! Issues in Batteries have been very underused but this may change in the near future. Please feel free to file a bug report!

That said, the definition of Vector is pretty straightforward so there is a decent chance the underlying issue is in core. Can you minimize further by defining your own Vector type?

structure MyVector (α : Type u) (n : Nat) where
  toArray : Array α
  size_eq : toArray.size = n

Shreyas Srinivas (Sep 27 2024 at 22:51):

Specifically, array notation?

Violeta Hernández (Sep 27 2024 at 22:52):

The issue doesn't seem to appear with array

import Init.Prelude

-- length 32, also works
example : Array Nat :=
  List.toArray [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]

-- length 33, also works
example : Array Nat :=
  List.toArray [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]

Shreyas Srinivas (Sep 27 2024 at 22:52):

No that's converting lists to arrays

Shreyas Srinivas (Sep 27 2024 at 22:52):

I think there is a separate array notation.

Shreyas Srinivas (Sep 27 2024 at 22:52):

That uses mkArray

Violeta Hernández (Sep 27 2024 at 22:53):

Yep, works too

import Init.Prelude

example : Array Nat :=
  #[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]

example : Array Nat :=
  #[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]

François G. Dorais (Sep 27 2024 at 22:53):

The issue is almost surely with rfl.

Shreyas Srinivas (Sep 27 2024 at 22:53):

Are you getting the same issue with your custom vector type + notation?

Violeta Hernández (Sep 27 2024 at 22:54):

I suspect as much, this reproduces the bug

import Init.Prelude

structure MyVector (α : Type u) (n : Nat) where
  toArray : Array α
  size_eq : toArray.size = n

-- works
example : MyVector Nat 32 :=
  ⟨#[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1], rfl

-- doesn't work
example : MyVector Nat 33 :=
  ⟨#[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1], rfl

Violeta Hernández (Sep 27 2024 at 22:54):

I'll file a bug under core Lean

Shreyas Srinivas (Sep 27 2024 at 22:55):

Then the issue is with rfl. How does rfl interact with the new wf reduction opaqueness policy?

François G. Dorais (Sep 27 2024 at 22:55):

That's what I would recommend. Thank you for letting us know about this!

Violeta Hernández (Sep 27 2024 at 22:57):

Huh? This issue doesn't reproduce in Lean nightly
https://live.lean-lang.org/#project=lean-nightly&codez=JYWwDg9gTgLgBASQHbBgOgApQKYBsCuAJtgFAkDOMU+AxjPjnALICeAatndHABSCNwHABccACosw2OPgCUvJELgA5AIYxZAdwAW2HCThwYEAIJQoylgpNmLfPXHLAAXtgD62AI4LDV82gfO4AF44JDJsAA9lcFxJYVYOLiglVTgAZgAmIUC7QAvyAGIAbQBGABpS8rLKiuqq2pr6usb6gF0SuCgAM1xAS/I4MMjo2OZ2TkMklXhU1Kzcwob5psWF5aXW9q7uoA

Shreyas Srinivas (Sep 27 2024 at 22:57):

Are you able to get the proof done with some other tactic, say simp?

Violeta Hernández (Sep 27 2024 at 22:57):

simp does work in this instance, yes

Shreyas Srinivas (Sep 27 2024 at 23:02):

Then maybe we ought to modify the notation to get the proof from simp

Shreyas Srinivas (Sep 27 2024 at 23:03):

I'll fix this on the batteries side

Shreyas Srinivas (Sep 27 2024 at 23:09):

Is it linked to this by any strange chance: https://github.com/leanprover/lean4/pull/5446

François G. Dorais (Sep 27 2024 at 23:12):

The issue is the same in lean-nightly, you just have to wait a while longer to see it happen on the web interface.

Violeta Hernández (Sep 27 2024 at 23:12):

I'm starting to think this might have nothing to do with arrays. I'm running into a separate issue where making a list of length 33 causes Lean to eat up all of my memory
image.png
This one I can't quite as easily reproduce though

Violeta Hernández (Sep 27 2024 at 23:13):

(and again, 32 is the magic number where it works fine)

Jannis Limperg (Sep 27 2024 at 23:17):

Afaik there are different elaborators for large and small array/list literals. Perhaps 32 is the cutoff and the large elaborator is somehow producing code that is difficult for rfl to evaluate.

François G. Dorais (Sep 27 2024 at 23:17):

@Kim Morrison is working a lot with array code right now, so I'm pinging them to take a look.

François G. Dorais (Sep 27 2024 at 23:19):

It's actually the weekend for them so we shouldn't expect a response in the immediate future.

François G. Dorais (Sep 27 2024 at 23:20):

Anyway, a bug report in core is the right way to go.

Kyle Miller (Sep 27 2024 at 23:20):

Jannis Limperg said:

Perhaps 32 is the cutoff

Yes, you can see it with

#check #[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]

and then adding an extra 1.

Kyle Miller (Sep 27 2024 at 23:24):

Interestingly, by rfl makes it elaborate instantly.

example : MyVector Nat 33 :=
  ⟨#[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, 1], by rfl

I guess somehow rfl is causing some bad defeq problems if it happens too soon?

Violeta Hernández (Sep 27 2024 at 23:24):

I've managed to reproduce the bug on bare lists! Give me a second

Violeta Hernández (Sep 27 2024 at 23:27):

import Init.Prelude

example : List Nat :=
  let l := [1]
  [l[0], l[0], l[0], l[0], l[0], l[0], l[0], l[0], l[0], l[0], l[0], l[0], l[0], l[0], l[0], l[0], l[0], l[0], l[0], l[0], l[0], l[0], l[0], l[0], l[0], l[0], l[0], l[0], l[0], l[0], l[0], l[0]]

Adding one more instance of l[0] makes Lean eat up all your RAM :cold_sweat:

Violeta Hernández (Sep 27 2024 at 23:27):

yet again, 32 is the magic number

Kyle Miller (Sep 27 2024 at 23:28):

I suspect that's a different performance bug, with indexing notation.

Kyle Miller (Sep 27 2024 at 23:30):

Though on second thought, maybe it's related, and it has to do with a bad interaction with the lets... (never mind, I think it's unrelated.)

Violeta Hernández (Sep 27 2024 at 23:30):

Oh nevermind, in this example it just takes a bit longer. In my actual code where the indexing list was more complicated it was in fact eating all my RAM

Violeta Hernández (Sep 27 2024 at 23:33):

I've opened #5502

Shreyas Srinivas (Sep 27 2024 at 23:34):

you might want to add the array example too

Shreyas Srinivas (Sep 27 2024 at 23:34):

It's clearly not isolated to vectors

Shreyas Srinivas (Sep 27 2024 at 23:35):

under steps to reproduce

Violeta Hernández (Sep 27 2024 at 23:35):

Which example? Declaring an array of length 33 worked just fine

Mario Carneiro (Sep 27 2024 at 23:38):

Yes, this has been known for a while (or at least I've complained about it on multiple occasions). There is a switchover to a different compilation strategy for large lists, but the strategy is actually much worse than the simple one

Mario Carneiro (Sep 27 2024 at 23:39):

it creates nested let bindings that break a bunch of fast paths in the compiler and elaborator

Violeta Hernández (Sep 27 2024 at 23:40):

Is there a way to disable it? Or at least increase the limit at which it takes hold?

Mario Carneiro (Sep 27 2024 at 23:40):

The reason it exists is because without it large list literals create very deep terms that lean doesn't handle well, but this could also be done by generating trees of appends instead

Mario Carneiro (Sep 27 2024 at 23:41):

also, the compiler unfolds all the let bindings and creates the deep terms anyway, so it seems kind of pointless

Mario Carneiro (Sep 27 2024 at 23:42):

You can disable it by locally overriding list notation to not use the bad macro

Shreyas Srinivas (Sep 27 2024 at 23:42):

So I tried all the failing examples with rfl above and by rfl works

Shreyas Srinivas (Sep 27 2024 at 23:42):

For example:

import Init.Prelude

structure MyVector (α : Type u) (n : Nat) where
  toArray : Array α
  size_eq : toArray.size = n

-- works
example : MyVector Nat 32 :=
  ⟨#[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1], rfl

-- doesn't work
example : MyVector Nat 33 :=
  ⟨#[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1], (by rfl)

Shreyas Srinivas (Sep 27 2024 at 23:43):

I didn't remove your comments from the original example, but they both work

Violeta Hernández (Sep 27 2024 at 23:43):

Mario Carneiro said:

You can disable it by locally overriding list notation to not use the bad macro

how?

Kyle Miller (Sep 27 2024 at 23:44):

(@Shreyas Srinivas I mentioned that here)

Shreyas Srinivas (Sep 27 2024 at 23:45):

oh yeah. sorry

Shreyas Srinivas (Sep 27 2024 at 23:45):

I must have passed over it.

Mario Carneiro (Sep 27 2024 at 23:46):

universe u
structure MyVector (α : Type u) (n : Nat) where
  toArray : Array α
  size_eq : toArray.size = n

open Lean
macro_rules
  | `([ $elems,* ]) => do
    let rec expandListLit (i : Nat) (skip : Bool) (result : TSyntax `term) : MacroM Syntax := do
      match i, skip with
      | 0,   _     => pure result
      | i+1, true  => expandListLit i false result
      | i+1, false => expandListLit i true  ( ``(List.cons $(elems.elemsAndSeps.get! i) $result))
    let size := elems.elemsAndSeps.size
    expandListLit size (size % 2 == 0) ( ``(List.nil))

-- works
example : MyVector Nat 32 :=
  ⟨#[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1], rfl

-- works
example : MyVector Nat 33 :=
  ⟨#[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1], rfl

Mario Carneiro (Sep 27 2024 at 23:47):

this is just a copy paste of the original macro with the slow path removed

Kyle Miller (Sep 27 2024 at 23:47):

It's possible that the difference between by rfl and rfl is that in this case the rfl tactic can and does use kernel defeq

François G. Dorais (Sep 27 2024 at 23:49):

Since Vector have known length, it is possible to figure out much better strategies before parsing the array. So maybe Vector needs an independent implementation of the said macro for maximal efficiency.

Shreyas Srinivas (Sep 27 2024 at 23:50):

I started the PR with by_rfl, but maybe I can add this macro Mario provided

Shreyas Srinivas (Sep 27 2024 at 23:51):

batteries#968

Violeta Hernández (Sep 27 2024 at 23:52):

Mario Carneiro said:

universe u
structure MyVector (α : Type u) (n : Nat) where
  toArray : Array α
  size_eq : toArray.size = n

open Lean
macro_rules
  | `([ $elems,* ]) => do
    let rec expandListLit (i : Nat) (skip : Bool) (result : TSyntax `term) : MacroM Syntax := do
      match i, skip with
      | 0,   _     => pure result
      | i+1, true  => expandListLit i false result
      | i+1, false => expandListLit i true  ( ``(List.cons $(elems.elemsAndSeps.get! i) $result))
    let size := elems.elemsAndSeps.size
    expandListLit size (size % 2 == 0) ( ``(List.nil))

-- works
example : MyVector Nat 32 :=
  ⟨#[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1], rfl

-- works
example : MyVector Nat 33 :=
  ⟨#[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1], rfl

My code with nested indexing doesn't hang anymore with this workaround! Thanks!

Shreyas Srinivas (Sep 27 2024 at 23:53):

OTOH, this macro makes no sense for vector since the existing one calls the array notation

Mario Carneiro (Sep 27 2024 at 23:58):

Here's a version of the slow path that constructs append trees instead of nested lets:

macro_rules
  | `([ $elems,* ]) => do
    let elems := elems.getElems
    let rec expand (lo i : Nat) (result : TSyntax `term) : MacroM (TSyntax `term) := do
      match i with
      | 0   => pure result
      | i+1 => expand lo i ( ``(List.cons $(elems[lo+i]!⟩) $result))
    let nil  ``(List.nil)
    let rec bisect (lo hi : Nat) : MacroM (TSyntax `term) := do
      if lo + 32 < hi then
        let mid := (lo + hi) / 2
        ``(List.append $( bisect lo mid) $( bisect mid hi))
      else
        expand lo (hi - lo) nil
    bisect 0 elems.size

Mario Carneiro (Sep 28 2024 at 00:02):

Also this macro is inefficient in that it constructs a big term to be processed by the elaborator, instead of just constructing the term directly, saving on a lot of metavariable construction and application elaboration

Mario Carneiro (Sep 28 2024 at 00:03):

e.g. lean has to work out that the type variables in all the list applications are the same, but this also interacts nontrivially with coercions inside the elements; this work can be shortcut by doing one big unification

Violeta Hernández (Sep 28 2024 at 00:52):

I'm running into yet another issue with vectors: indexing is really slow!

import Batteries.Data.Vector.Basic

-- Works, really slow.
example : #v[1,1,1,1,1,1,1,1,1,1,1,1,1,1][4] = 1 := rfl

-- Timeout
example : #v[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1][4] = 1 := rfl

Violeta Hernández (Sep 28 2024 at 00:53):

This is even with Mario's workaround

Violeta Hernández (Sep 28 2024 at 00:54):

And even with by rfl

Violeta Hernández (Sep 28 2024 at 00:55):

Actually, this times out even if the proof is sorried.

Violeta Hernández (Sep 28 2024 at 00:57):

This issue doesn't appear with lists or even with arrays

Mario Carneiro (Sep 28 2024 at 00:57):

this is in part caused by List.toArray, for which a fix recently landed in lean core

Shreyas Srinivas (Sep 28 2024 at 00:57):

The by rfl fix is ready and passed both batteries CI and mathlib build: https://github.com/leanprover-community/batteries/pull/968#issuecomment-2380322025

Mario Carneiro (Sep 28 2024 at 00:59):

here's a demonstration of the effect that the List.toArray to Array.mk change will have:

-- today: slow
example : (Vector.mk (n := 15) (List.toArray [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]) rfl)[4] = 1 := rfl
-- tomorrow: fast
example : (Vector.mk (n := 15) (Array.mk [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]) rfl)[4] = 1 := rfl

Mario Carneiro (Sep 28 2024 at 00:59):

that expression is the result of macro expansion on #v[...] syntax

Mario Carneiro (Sep 28 2024 at 01:00):

cc @Kim Morrison in case you thought it didn't matter :)

Violeta Hernández (Sep 28 2024 at 01:02):

Is the relevant PR this one? https://github.com/leanprover/lean4/pull/5446

Mario Carneiro (Sep 28 2024 at 01:04):

lean4#5452 is attempt number 2 at that PR

Violeta Hernández (Sep 28 2024 at 01:07):

Overrode the #[] notation in my code, and yeah, this is working like a charm

Kim Morrison (Sep 28 2024 at 03:54):

Violeta Hernández said:

Is the relevant PR this one? https://github.com/leanprover/lean4/pull/5446

This one is still potentially in flight. I think it should have no effect (hence not worrying about it too much!), but it fails in rebootstrapping. If anyone could explain to me what is going on there I'd be grateful.


Last updated: May 02 2025 at 03:31 UTC