Zulip Chat Archive

Stream: lean4 dev

Topic: RFC: Use unsigned ints for `FVarId` etc.


Jannis Limperg (Jan 20 2023 at 21:39):

I wrote a patch for Lean 4 which uses UInt64 instead of Name for FVarId, MVarId and LMVarId. My hypothesis is that this will improve performance since these types are used as hash map keys throughout the system.

The compiler part of the patch turned out to be quite straightforward. There are a few places where subroutines need distinct ID 'namespaces'; to accommodate this, I currently reserve 8 bits of the ID for an 'ID class'.

However, the patch breaks the kernel since it also uses Names in its Expr representation. If there's interest in this patch, I would need help with this part.

Sebastian Ullrich (Jan 20 2023 at 21:56):

Without taking a position on the RFC, I'm constantly surprised to never really see anything of hashing and hash maps when profiling (except for the interpreter)

Sebastian Ullrich (Jan 20 2023 at 21:58):

Also this should probably move to #lean4 dev

Jannis Limperg (Jan 20 2023 at 22:16):

Sebastian Ullrich said:

Without taking a position on the RFC, I'm constantly surprised to never really see anything of hashing and hash maps when profiling (except for the interpreter)

Ha, then maybe it doesn't matter that much. The patch also reduces the size of expressions a little bit and should speed up all equality and ordering comparisons involving IDs. Whether this amounts to anything substantial I don't dare to predict.

Jannis Limperg (Jan 20 2023 at 22:16):

Sebastian Ullrich said:

Also this should probably move to #lean4 dev

Feel free to move it; I don't have the necessary rights.

Jannis Limperg (Jan 20 2023 at 22:40):

Patch for the Lean parts is here: https://github.com/JLimperg/lean4/tree/efficient-ids

Notification Bot (Jan 21 2023 at 01:05):

This topic was moved here from #lean4 > RFC: Use unsigned ints for FVarId etc. by Mario Carneiro.

Mac (Jan 24 2023 at 18:02):

@Jannis Limperg Why UInt64 instead of Nat? For normal values there should not be much difference and it removes the theoretical possibility of overflow from the abstraction (assisting any eventual proofs involving that part of the system).

Jannis Limperg (Jan 25 2023 at 16:09):

If this is preferred, it's fine by me as well. It would sacrifice some performance: Nats are behind a pointer (I think) and one would need a pair or structure to integrate the ID class, so that makes three pointer dereferences for an equality check. Personally, I don't mind a kernel that only does the first 72057594037927936 inferences correctly, but reasonable people can disagree on this.

Gabriel Ebner (Jan 25 2023 at 17:57):

Nat's are behind a pointer

Lean uses pointer tagging for small numbers. The overhead is a right-shift, not a pointer dereference.

Jannis Limperg (Jan 26 2023 at 10:34):

Ah okay, good to know!


Last updated: Dec 20 2023 at 11:08 UTC