Zulip Chat Archive

Stream: lean4

Topic: simple binary naturals in Lean4


michal wallace (tangentstorm) (Jun 26 2022 at 18:18):

I can see that I can't really do better than this when I'm converting from the peano definitions, but ... Surely Lean itself doesn't represent the number 54321 internally as a linked list of that many succ applications, right? I can see references to UInt32 ... but I don't (yet) see anything like even or odd or giveMeTheBitsOfThisArbitrarilySizedUnsignedInteger...

should I care about this at all?

Henrik Böving (Jun 26 2022 at 18:21):

Lean will internally use big integers (right now GMP but there is a plan to move away from that eventually) to represent a Nat, the UInt types you are mentioning get translated to their C equivalent by the compiler. The reason you are not seing methods defined on stuff is that we didn't need them yet so nobody wrote them.

Henrik Böving (Jun 26 2022 at 18:22):

Either way your implementation will indeed be far inferior to the built-in Nat performance wise.

michal wallace (tangentstorm) (Jun 26 2022 at 18:26):

fair enough. i don't need to use it for actual calculations. just wanted to make sure i understood. thanks, @Henrik Böving !

michal wallace (tangentstorm) (Jun 26 2022 at 18:31):

code is here if anyone is interested: https://github.com/tangentstorm/tangentlabs/blob/master/lean4/Bin/Bin.lean

michal wallace (tangentstorm) (Jun 26 2022 at 18:42):

My other main question is from a math/proof perspective. I included this constructor that represents the infinite stream of zeros... So the mapping between Nat -> Bin is a 1-to-many relationship. (I don't know what you call it in math land.) Like... 1 is I B but so is I O B and I O O O O O B ....

Bascially, B is an infinite stream of zero digits, so you can multiply it by 2 as many times as you like and it's still the same.

How would I approach telling Lean that I expect this to map 1-to-1 with Nat, but only for some well-behaved Bin instances?

michal wallace (tangentstorm) (Jun 26 2022 at 18:46):

def wellFormed : Bin  Bool
| B => true
| O B => false
| O x => wellFormed x
| I x => wellFormed x

Henrik Böving (Jun 26 2022 at 18:48):

You could for example define what it means to be a well behaved bin instance and then tell lean that if you have a bin instance that is well behaved there exists a bijection between Nat and Bin.

The nicest way would however be if you were to drop your current representation and would attempt to come up with one that can only produce well behaved values, for example one might come up with a non empty list of 0s and 1s type like:

inductive Bin
  | zero : Bin
  | one : Bin

inductive BinNumber where
  | single : Bin -> BinNumber
  | additional : Bin -> BinNumber -> BinNumber

though there are of course many ways to represent binary numbers that might be better than this

Kevin Buzzard (Jun 26 2022 at 18:48):

In Lean 3 we have docs#pos_num and docs#num which do this.

michal wallace (tangentstorm) (Jun 26 2022 at 18:59):

@Henrik Böving i like that, but it doesn't quite solve the problem.... you can still do additional zero (single zero)...

michal wallace (tangentstorm) (Jun 26 2022 at 19:00):

@Kevin Buzzard has an awesome lean 3 tutorial that just does away with zero https://www.youtube.com/watch?v=9f7WQkKd6x0 but i guess i wanted to make things complicated for myself. :)

michal wallace (tangentstorm) (Jun 26 2022 at 19:00):

i see that is pos_num

michal wallace (tangentstorm) (Jun 26 2022 at 19:01):

aha! and then num creates a bigger type with the zero.

michal wallace (tangentstorm) (Jun 26 2022 at 19:03):

still i feel like part of the promise of a dependent type system is that i should be able to do this my (possibly dumb) way. :)

Henrik Böving (Jun 26 2022 at 19:05):

Sure, you can define your well behaved thingy and then use a docs4#Subtype to define well behaved binary numbers based on that but things will get a little annoying on recursive calls because you have to always provide proofs that your well behavedness property is not broken.

The general rule of thumb i've learn from CPDT is that if you have a simpler way that does not involve using a dependent type use it.

Kevin Buzzard (Jun 26 2022 at 19:05):

If you have too many things representing one thing then you can also take a quotient.

michal wallace (tangentstorm) (Jun 26 2022 at 19:13):

http://adam.chlipala.net/cpdt/ ?

Henrik Böving (Jun 26 2022 at 19:13):

Yes

michal wallace (tangentstorm) (Jun 26 2022 at 19:14):

Is porting pos_num and num to lean4 something we want, and just nobody's gotten around to it, or are they going away in mathlib4?

Henrik Böving (Jun 26 2022 at 19:16):

You could basically copy their definition and change the naming scheme and everything would be fine already.

That being said the Lean3->4 porting tool mathport will eventually take care of it

michal wallace (tangentstorm) (Jun 26 2022 at 19:21):

Aha. The parsing and compilation features are actually the whole reason I wanted to learn lean4 instead 3. I was going to offer to volunteer to do the porting work, but maybe my time would be better spent learning about mathport.

Kevin Buzzard (Jun 26 2022 at 19:29):

The blog post is now 6 months out of date but contains at least some basic information.

Henrik Böving (Jun 26 2022 at 19:40):

The main thing that will be necessary apart from mathport is porting the lean 3 tactics to lean 4 for which we are currently putting together ressources here: https://github.com/arthurpaulino/lean4-metaprogramming-book/, if you wanna help the porting effort porting tactics will probably be the most efficient way

michal wallace (tangentstorm) (Jun 26 2022 at 19:50):

i would definitely like to get into tactics, but i feel like that's probably too much for me to learn all as a next step.

michal wallace (tangentstorm) (Jun 26 2022 at 19:53):

Looking at https://raw.githubusercontent.com/leanprover-community/mathlib/master/src/data/num/basic.lean and https://raw.githubusercontent.com/leanprover-community/mathlib3port/master/Mathbin/Data/Num/Basic.lean , I think I see some low-hanging fruit where the translation is breaking.

michal wallace (tangentstorm) (Jun 26 2022 at 19:53):

is cases_on a tactic?

Henrik Böving (Jun 26 2022 at 19:55):

No it is a function the Lean compiler is generating automatically on things out of Type u, what things are you seeing that break?

michal wallace (tangentstorm) (Jun 26 2022 at 20:10):

basic things i'm seeing:

  • it couldn't figure out the lean4 version of @deriving has_reflect (if there is one?)
  • the presence of has_one or has_zero in the input should trigger an import Mathlib.Algebra.Group.Defs line in the output
  • it's choosing to capitalize the pos constructor for Posnum and Znum in some outputs (but not others), while zero and neg remain lower case.

michal wallace (tangentstorm) (Jun 26 2022 at 20:11):

protected def add : PosNum  PosNum  PosNum
  | 1, b => succ b
  | a, 1 => succ a
  | bit0 a, bit0 b => bit0 (add a b)
  | bit1 a, bit1 b => bit0 (succ (add a b))
  | bit0 a, bit1 b => bit1 (add a b)
  | bit1 a, bit0 b => bit1 (add a b)

lean4 wants the references to add in this output to be PosNum.add ... maybe because add is marked protected?

michal wallace (tangentstorm) (Jun 26 2022 at 20:11):

here is where cases_on is causing trouble:

def pred' : PosNum  Num
  | 1 => 0
  | bit0 n => Num.pos (Num.casesOn (pred' n) 1 bit1)
  | bit1 n => Num.pos (bit0 n)

michal wallace (tangentstorm) (Jun 26 2022 at 20:13):

this one was originally called nat_size and gets renamed, but not in the recursive calls:

def natSize : PosNum  Nat
  | 1 => 1
  | bit0 n => Nat.succ (nat_size n)
  | bit1 n => Nat.succ (nat_size n)

michal wallace (tangentstorm) (Jun 26 2022 at 20:14):

cmp returns an Ordering and the constants are mis-capitalized...

michal wallace (tangentstorm) (Jun 26 2022 at 20:14):

and so on. these all seem like low-hanging fruit to me (except cases_on / casesOn and possibly finding a replacement for has_reflect)

michal wallace (tangentstorm) (Jun 26 2022 at 20:15):

but also i'm saying that before looking at the transpiler. :D

michal wallace (tangentstorm) (Jun 26 2022 at 20:27):

i think the casesOn is actually fine, it's the recursive call that's the problem.

michal wallace (tangentstorm) (Jun 26 2022 at 20:32):

there aren't really that many proofs in this file. the only tactic i see being used is dsimp' , which does seem to be missing.

michal wallace (tangentstorm) (Jun 26 2022 at 20:36):

okay. I will commit to manually porting this file.

If succeed at that, I will look into having mathport make my changes automatically, but I'm not quite ready to commit to that yet.

Henrik Böving (Jun 26 2022 at 20:39):

the has_reflect part will actually have to be a meta program since deriving handlers are meta programs :p, apart from that it should definitely be possible. If you wanna help with mathport the people to talk to will be Gabriel Ebner and Daniel Selsam

michal wallace (tangentstorm) (Jun 26 2022 at 20:43):

ok. should I just keep posting in this thread, or make a new one somewhere else?

Henrik Böving (Jun 26 2022 at 20:46):

To my knowledge there hasn't been a thread regarding people that want to contribute to mathport yet so it could definitely be worth to make a new one to discuss that

michal wallace (tangentstorm) (Jun 26 2022 at 20:52):

oh i see. my understanding is that lean4 has a general parsing framework at its core... maybe tactics, has_reflect and mathport itself are all using this same framework.

michal wallace (tangentstorm) (Jun 26 2022 at 20:53):

in that case, tactics are probably just little compilers that also have access to the current state and a database of known theorems... huh.

Henrik Böving (Jun 26 2022 at 21:01):

If you want to know what Lean 4 does in detail you can read the meta programming book linked above^^

To sum it up:

  • you can define close to arbitrary syntax with it
  • you can define macros, that is syntax -> syntax translations to give a semantic to your syntax
  • you can define term elaborators (they turn syntax into expressions, that is the low level dependent lambda calculus lean is based on) and command elaborators (the thing that handle stuff like def, theorem and interact with what is called the Environmentm)
  • you can define tactics which are an abstraction even stronger than term elaborators, they use meta variables to incrementally construct expressions which are then fed to the type checker in order to check your proof (though tactics can in theory also create non proof terms but that is another topic)

mathport has two modes of operation, the first is binport which turns the lean 3 binary .olean format into lean 4 one and is already working quite well, the second one is synport which does the syntax -> syntax translation, looking at the mathport code it does not seem to me like they are making use of this machinery explained above.

michal wallace (tangentstorm) (Jun 26 2022 at 21:04):

looking at https://github.com/leanprover-community/mathport/blob/master/Mathport/Syntax/Parse.lean my guess is they wrote a custom parser for lean3, but then they're using Lean.Expr for lean 4

Mario Carneiro (Jun 26 2022 at 21:41):

@michal wallace (tangentstorm) To implement that function more efficiently than by unary recursion, you can just do repeated halving of the input number.

def ofNat (n:Nat) : Bin :=
  if h : n = 0 then B else
  have : n / 2 < n :=
    Nat.div_lt_self ((Nat.eq_zero_or_pos _).elim (fun h' => (h h').elim) id) (by decide)
  if n % 2 = 1 then O (ofNat (n / 2)) else I (ofNat (n / 2))

michal wallace (tangentstorm) (Jun 27 2022 at 00:24):

@Mario Carneiro aha! yeah, that's exactly what i wanted to write, but I didn't realize I had access to % (I was looking for something named even or odd). Thanks!

michal wallace (tangentstorm) (Jun 27 2022 at 14:42):

Henrik Böving said:

No it is a function the Lean compiler is generating automatically on things out of Type u, what things are you seeing that break?

So in lean 4, casesOn seems to require an extra parameter: (motive =: fun _ => T)... For Data/Num/Basic, T was either Num or Ordering... I started a new thread for the required mathport changes.


Last updated: Dec 20 2023 at 11:08 UTC