Zulip Chat Archive

Stream: general

Topic: universe hackery


Kevin Buzzard (May 03 2023 at 09:41):

(From user BeLazy on the Discord, thought it might amuse, this is Lean 4):

#check Type 37 -- maximum universe level offset threshold (32) has been reached,
#check Type (32 + 5) -- `Type 37 : Type 38`

def Lift100.{u} (A: Type u):=  ULift.{u+32+32+32+4,u} A

def Lift1000 := Lift100   Lift100  Lift100   Lift100  Lift100 
  Lift100  Lift100   Lift100  Lift100   Lift100

#check Lift1000 Type
-- Lift1000 Type : Type 1001

Reid Barton (May 03 2023 at 09:49):

Will it overflow?

Reid Barton (May 03 2023 at 09:50):

I guess by defining Lift10000, etc. you can get the universe levels exponentially high but maybe Lean will also take exponentially long to type check them

Arthur Adjedj (May 03 2023 at 09:55):

This crashes on my PC:

def Lift100.{u} (A: Type u):=  ULift.{u+32+32+32+4,u} A

def Lift1000 := Lift100   Lift100  Lift100   Lift100  Lift100   Lift100  Lift100   Lift100  Lift100   Lift100

def Lift10000 := Lift1000   Lift1000  Lift1000   Lift1000  Lift1000   Lift1000  Lift1000   Lift1000  Lift1000  Lift1000

def Lift100000 := Lift10000   Lift10000  Lift10000   Lift10000  Lift10000   Lift10000  Lift10000   Lift10000  Lift10000  Lift10000

def Lift1000000 :=
    Lift100000 
    Lift100000 
    Lift100000 
    Lift100000 
    Lift100000 
    Lift100000 
    Lift100000 
    Lift100000 
    Lift100000 
    Lift100000

Arthur Adjedj (May 03 2023 at 09:56):

The crash is likely due to the fact that such universes are encoded as unary successors, and don't get the same kind of low-level implementation/optimisation that Nats and Ints do.

Arthur Adjedj (May 03 2023 at 10:10):

I encountered similar issues when implementing our universe system for Proost, the solution we landed on was to replace the succ : Level → Level constructor to a plus : Level → Nat → Level constructor. This would be both more efficient in memory and should give a faster type-checking algorithm, at least if the one currently implemented in Lean 4 still resembles the one described in @Mario Carneiro 's thesis.

Mario Carneiro (May 03 2023 at 10:11):

I think lean itself does this

Mario Carneiro (May 03 2023 at 10:12):

There is a hard coded limit on universe level sizes, I thought it was much less than 100000 though

Mario Carneiro (May 03 2023 at 10:13):

it has a snarky message about misusing universes

Mario Carneiro (May 03 2023 at 10:13):

ah right, #check Type 64 shows this

Arthur Adjedj (May 03 2023 at 10:14):

Mario Carneiro said:

I think lean itself does this

perhaps the "low-level" implementation of levels do, but in the Lean code at least, Levels are encoded as such:

inductive Level where
  | zero   : Level
  | succ   : Level  Level
  | max    : Level  Level  Level
  | imax   : Level  Level  Level
  | param  : Name  Level
  | mvar   : LMVarId  Level

Arthur Adjedj (May 03 2023 at 10:15):

Mario Carneiro said:

ah right, #check Type 64 shows this

It can be circumvented by adding things together.#check Type (32+32) doesn't raise any issue

Mario Carneiro (May 03 2023 at 10:15):

there is another limit beyond this, I forget where it is

Mario Carneiro (May 03 2023 at 10:16):

I'm thinking of the to_offset() function in level.cpp BTW, the syntax uses naive succ but it is converted to a (level, offset) pair for normalization

Mario Carneiro (May 03 2023 at 10:18):

and BTW to_offset uses an unsigned with no overflow checks so if you can make it to 2^32 without a crash you might get a soundness bug out of it

Arthur Adjedj (May 03 2023 at 10:18):

oooooh, OMW to do so then

Mario Carneiro (May 03 2023 at 10:19):

it's probably not possible without overflowing the stack, because you will have to create a very deep Level term

Arthur Adjedj (May 03 2023 at 10:19):

lean seems to stack overflow before that though, yeah

Mario Carneiro (May 03 2023 at 10:20):

maybe if you raise the stack limit...?

Mario Carneiro (May 03 2023 at 10:20):

I don't know what functions are going to get hit hardest

Mario Carneiro (May 03 2023 at 10:21):

you will need (2^32 * stack frame of recursive function) bytes of memory, which is a lot but might be feasible

Arthur Adjedj (May 03 2023 at 10:22):

I fear that the lift composition trick up here might give the elaboration a hard time unifying that many universes, so simply doing Type 32 +........ +32 might be wiser here. Do you know of any easy way to raise the stack limit ?

Mario Carneiro (May 03 2023 at 10:23):

Use addDecl and just hand the level directly to the kernel

Mario Carneiro (May 03 2023 at 10:25):

aha, here's the hidden limit: https://github.com/leanprover/lean4/blob/master/src/Lean/Level.lean#L46

Mario Carneiro (May 03 2023 at 10:28):

Even just using Level.succ enough times should trigger that

Arthur Adjedj (May 03 2023 at 10:30):

So the error would be raised as soon as the computed field would get computed first. This seems hard to avoid

Mario Carneiro (May 03 2023 at 10:32):

yeah I would say that this is not a soundness bug

Mario Carneiro (May 03 2023 at 10:33):

although it still seems difficult to hit this using user code, since all the elaboration tricks overflow

Arthur Adjedj (May 03 2023 at 10:34):

Wouldn't the kdef trick still work here to send all that to the kernel without checking any of this during elaboration ?

Mario Carneiro (May 03 2023 at 10:34):

No, because you would need to construct a Level to pass to the kernel and you can't even do that

Mario Carneiro (May 03 2023 at 10:35):

basically, there is a bounds check, it is just not in the kernel but in the level constructor

Mario Carneiro (May 03 2023 at 10:35):

(which is part of the kernel in a sense)

Arthur Adjedj (May 03 2023 at 10:36):

Aren't computed fields lazily computed ? As long as no function calling for Level.data is used to generate the kdef, why wouldn't this work ?

Mario Carneiro (May 03 2023 at 10:36):

no

Arthur Adjedj (May 03 2023 at 10:36):

oh, my bad then

Mario Carneiro (May 03 2023 at 10:37):

computed fields are extra fields stored with the data and eagerly computed on each constructor application

Mario Carneiro (May 03 2023 at 10:38):

before computed fields all these constructors had an additional argument for the data field and you had to use smart constructors to build the values

Arthur Adjedj (May 03 2023 at 10:38):

My belief was that, while stored with the data, they were still computed "on demand", similar to doing OnceCell headers in Rust

Mario Carneiro (May 03 2023 at 10:38):

but this is also sketchy since you could just not use the smart constructor

Mario Carneiro (May 03 2023 at 10:38):

it's closer to e.g. the len field on a HashMap

Arthur Adjedj (May 03 2023 at 10:39):

The more you know ! Thanks for the insight :)

Mario Carneiro (May 03 2023 at 10:39):

the fix for that is to add an equality hypothesis BTW, but lean core doesn't use a lot of dependent types


Last updated: Dec 20 2023 at 11:08 UTC