Zulip Chat Archive

Stream: lean4

Topic: Need suggestions to improve a DSL for dimensional calculus

Nicolas Rouquette (Jan 30 2022 at 00:55):

This is my first real lean4 program: https://github.com/NicolasRouquette/dimensional-calculus.lean4

I would like to document my journey in progressing from unproved object-oriented programming to proven functional programming style. I equate the former to the kind of lean4 code someone familiar with Java might write (well, perhaps a wee bit more than Java, perhaps oo-style Scala programming) and the latter to the kind of lean4 code that comes with verified proofs of useful properties.

This DSL domain, dimensional analysis, is mathematically very simple: equations involving variables combined using multiplications, divisions and rational powers of such. Using the CADE 2021 paper and the BoolExpr from the doc, I managed to make a basic DSL for this domain.

My questions are the following:

1) The DSL is not very pretty (look at Main.lean); it is unclear to me how to make it more elegant where I could parse a file of dimensional calculus formulas.

2) How should I organize my code w.r.t convenience functions like these below.

At some point, it does not make sense to bloat the standard library with such convenience functions.
Yet, these may be useful in a particular application. Where should I put these in a way that a reasonable reader might find them instead of adding them where I need them?

-- TODO: Where should these HashSet extensions be defined?

def ofList [BEq α] [Hashable α] (l : List α) : HashSet α :=
  l.foldl (init := HashSet.empty) (fun m p => m.insert p)

def union [BEq α] [Hashable α] (h1: HashSet α) (h2: HashSet α) : HashSet α :=
  h1.fold (init := h2) (fun h p => h.insert p)

def containsAll [BEq α] [Hashable α] (sup: HashSet α) (sub: HashSet α) : Bool :=
  sub.fold (init := true) (fun b p => b && sup.contains p)

3) I have seen in the standard library comments like this:

-- TODO: remove `partial` using well-founded recursion

I also have 3 similar cases:

partial def applyMul (fs1: DCalcFactors) (fs2: DCalcFactors) : DCalcFactors
partial def applyDiv (fs1: DCalcFactors) (fs2: DCalcFactors) : DCalcFactors
partial def substitute (ctx: Context) (fs: DCalcFactors) : DCalcFactors

How do we go about making these recursive functions 'well-founded' so that they would be total recursive functions?

I suspect that the answer depends, in part, in choosing the right kind of data structure that makes it easier for lean4 to reason about structural recursion.

4) Proving some properties about the program.

Is (3) necessary to prove properties about some data structures like DCalcFactors?

I have been looking for examples in the lean4 source code and found something interesting in HashSet.lean:

inductive WellFormed [BEq α] [Hashable α] : HashSetImp α → Prop where
  | mkWff     : ∀ n,                  WellFormed (mkHashSetImp n)
  | insertWff : ∀ m a, WellFormed m → WellFormed (insert m a)
  | eraseWff  : ∀ m a, WellFormed m → WellFormed (erase m a)


def HashSet (α : Type u) [BEq α] [Hashable α] :=
  { m : HashSetImp α // m.WellFormed }

As the name suggests, it seems that HashSetImpl is the implementation of HashSet.
I am not entirely sure that I understand the elegance of formalizing well-formedness for hashsets.
Can someone explain this construction and proof?

Nicolas.

Mario Carneiro (Jan 30 2022 at 02:21):

How do we go about making these recursive functions 'well-founded' so that they would be total recursive functions?

You haven't shown the definition of the function. The way to prove a definition well founded depends primarily on the ways in which you make recursive calls, and the structure of the domain type (here DCalcFactors). It's pretty difficult to give advice without this

Mario Carneiro (Jan 30 2022 at 02:27):

Oh, you left a link. To prove a function is well founded, you need a "variant" that decreases on every recursive call. In the case of applyMul, a suitable variant is fs2.size. You would have to prove that fs2.size != 0 -> (HashMap.ofList fs2.toList.tail!).size < fs2.size, which in theory should be combining some lemmas from the standard library

Mario Carneiro (Jan 30 2022 at 02:28):

The other two functions have exactly the same recursion structure, so you could use the same proof

Mario Carneiro (Jan 30 2022 at 02:30):

Is (3) necessary to prove properties about some data structures like DCalcFactors?

Yes, more or less. As far as the logic is concerned, any partial function can have any behavior consistent with the type given to it. So for example there is no way you can prove that applyMul is not the same as def applyMul (fs1 fs2 : DCalcFactors) := fs1

Mario Carneiro (Jan 30 2022 at 02:31):

Note that Lean 4 core is not overly concerned with proving correctness of its data structures. You are better off looking at mathlib4 if you are interested in that sort of thing

Mario Carneiro (Jan 30 2022 at 02:33):

For example, see BinaryHeap.lean which is a binary heap implementation which does not use partial by using termination_by where necessary

Mario Carneiro (Jan 30 2022 at 02:37):

For (2) your best bet is to put the functions in mathlib4. Most of us also have a for_mathlib file or directory in our projects containing definitions and lemmas that aren't really project specific and really ought to be PR'd to mathlib but we haven't got around to it yet

Nicolas Rouquette (Jan 30 2022 at 19:08):

Mario Carneiro said:

Oh, you left a link. To prove a function is well founded, you need a "variant" that decreases on every recursive call. In the case of applyMul, a suitable variant is fs2.size. You would have to prove that fs2.size != 0 -> (HashMap.ofList fs2.toList.tail!).size < fs2.size, which in theory should be combining some lemmas from the standard library

Thanks for the suggestion, I understand the idea:

show that the recursive calls are monotonically decreasing as a function of the size of the 2nd argument.

What I do not understand is how to express this in lean4; I tried your suggestion after refactoring
the code to make sure that both recursive calls use the same monotonically decreasing subset of the 2nd argument.

def applyMul (fs1: DCalcFactors) (fs2: DCalcFactors) : DCalcFactors :=
  let l2 : List DCalcFactor := fs2.toList
  let f2 := l2.head!
  let f2tail := HashMap.ofList l2.tail!
  if 0 == fs2.size then
    fs1
  else
    match fs1.getOp f2.fst with
    | none =>
      let fs1b := fs1.insert f2.fst f2.snd
      applyMul fs1b f2tail  -- tactic 'assumption' failed
    | some (f1a : Lean.Rat) =>
      let f12 : DCalcFactor := ⟨ f2.fst, f1a + f2.snd ⟩
      let fs1a := fs1.erase f2.fst
      let fs1b := if 0 == f12.snd.num then fs1a else fs1a.insert f12.fst f12.snd
      applyMul fs1b f2tail
termination_by _ => fs2.size != 0 -> (HashMap.ofList fs2.toList.tail!).size < fs2.size

That there is no error on the 2nd match clause seems to be a consequence that the type checker stops at the 1st error.
Indeed, if I switch the order of the clauses, I get a similar error.

I wanted to write the termination criteria by referring to one of the let bindings, similar to what I've seen in Lean's
Init/Data/Array/Basic.lean:

def reverse (as : Array α) : Array α :=
  let n   := as.size
  let mid := n / 2
  let rec rev (as : Array α) (i : Nat) :=
    if h : i < mid then
      rev (as.swap! i (n - i - 1)) (i+1)
    else
      as
  rev as 0
termination_by _ => mid - i

How is it that Lean4 in the above accepts a reference to a let binding, i.e., mid, but in my case it does not:

def applyMul (fs1: DCalcFactors) (fs2: DCalcFactors) : DCalcFactors :=
  let l2 : List DCalcFactor := fs2.toList
  let f2 := l2.head!
  let f2tail := HashMap.ofList l2.tail!
  let f2tailsize := f2tail.size
  if 0 == fs2.size then
    fs1
  else
    match fs1.getOp f2.fst with
    | some (f1a : Lean.Rat) =>
      let f12 : DCalcFactor := ⟨ f2.fst, f1a + f2.snd ⟩
      let fs1a := fs1.erase f2.fst
      let fs1b := if 0 == f12.snd.num then fs1a else fs1a.insert f12.fst f12.snd
      applyMul fs1b f2tail -- tactic 'assumption' failed
    | none =>
      let fs1b := fs1.insert f2.fst f2.snd
      applyMul fs1b f2tail  -- tactic 'assumption' failed
--termination_by _ => fs2.size != 0 -> (HashMap.ofList fs2.toList.tail!).size < fs2.size
--termination_by _ => fs2.size != 0 -> f2tail.size < fs2.size -- unknown identifier 'f2tail.size'
termination_by _ => fs2.size != 0 -> f2tailsize < fs2.size -- unknown identifier 'f2tailsize'

I tried to minic the pattern used in reverse without success.

On the other hand, I managed to make this variant work:

def applyMulAux (fs1: DCalcFactors) (l2: List DCalcFactor): DCalcFactors :=
  match l2 with
  | List.nil =>
    fs1
  | List.cons f2 t2 =>
    match fs1.getOp f2.fst with
    | some (f1a : Lean.Rat) =>
      let f12 : DCalcFactor := ⟨ f2.fst, f1a + f2.snd ⟩
      let fs1a := fs1.erase f2.fst
      let fs1b := if 0 == f12.snd.num then fs1a else fs1a.insert f12.fst f12.snd
      applyMulAux fs1b t2
    | none =>
      let fs1b := fs1.insert f2.fst f2.snd
      applyMulAux fs1b t2

def applyMul (fs1: DCalcFactors) (fs2: DCalcFactors) : DCalcFactors :=
  applyMulAux fs1 fs2.toList

I would like to know what would it take to make the original version w/ a HashMap work as a well-founded recursion.

Nicolas.

Mario Carneiro (Jan 30 2022 at 20:44):

The syntax is a bit off in your first snippet. What I was suggesting was something like this:

def applyMul (fs1: DCalcFactors) (fs2: DCalcFactors) : DCalcFactors :=
  let l2 : List DCalcFactor := fs2.toList
  let f2 := l2.head!
  let f2tail := HashMap.ofList l2.tail!
  if h : fs2.size = 0 then
    fs1
  else
    have : f2tail.size < fs2.size :=
      have : fs2.size ≠ 0 → (HashMap.ofList fs2.toList.tail!).size < fs2.size := sorry
      this h
    match fs1.getOp f2.fst with
    | none =>
      let fs1b := fs1.insert f2.fst f2.snd
      applyMul fs1b f2tail  -- tactic 'assumption' failed
    | some (f1a : Lean.Rat) =>
      let f12 : DCalcFactor := ⟨ f2.fst, f1a + f2.snd ⟩
      let fs1a := fs1.erase f2.fst
      let fs1b := if 0 == f12.snd.num then fs1a else fs1a.insert f12.fst f12.snd
      applyMul fs1b f2tail
termination_by _ fs1 fs2 => fs2.size

Mario Carneiro (Jan 30 2022 at 20:44):

that is, fs2.size ≠ 0 → (HashMap.ofList fs2.toList.tail!).size < fs2.size is what you need to prove to make this termination argument work

Mario Carneiro (Jan 30 2022 at 20:45):

This code is somewhat inefficient in repeatedly converting a hashmap into a list and back again on every iteration, so I would expect your second approach to be more efficient in addition to being much easier to prove termination

Mario Carneiro (Jan 30 2022 at 20:49):

By the way, I see you use if 0 == value a lot. I guess you learned to use these "yoda conditionals" in java land, but in lean there really isn't any need to do this at all, because assignment doesn't use = and isn't an expression anyway - there is no way you could write an assignment inside an if conditional even if you wanted to

Mario Carneiro (Jan 30 2022 at 20:51):

It's also useful when capturing hypotheses about if-statements to write them idiomatically for the lemmas they will be used with. I changed the first if statement to if h : fs2.size = 0 so that the else branch gets an assumption fs2.size ≠ 0, which is necessary for the proof of the sorry

Mario Carneiro (Jan 30 2022 at 20:52):

@Nicolas Rouquette

Ruben Van de Velde (Jan 30 2022 at 21:10):

You should only use yoda comparisons like 0 < value :)

Patrick Massot (Jan 30 2022 at 21:16):

We should rename linter.ge_or_gt to linter.yoda.

Nicolas Rouquette (Jan 31 2022 at 00:10):

Mario Carneiro said:

The syntax is a bit off in your first snippet. What I was suggesting was something like this:
def applyMul (fs1: DCalcFactors) (fs2: DCalcFactors) : DCalcFactors :=
...
termination_by _ fs1 fs2 => fs2.size

Hum, based on what I see in mathlib4, I get the feeling that there are at least two ways to prove properties of functions (recursive or not):

1) Add the termination_by _ <arguments> => .... clause
2) Use an auxiliary function where the return carries a proof

def applyMulAux (fs1: DCalcFactors) (fs2: DCalcFactors) : { fs2': DCalcFactors // -- invariant property about fs2 and fs2' } :=
...

def applyMul (fs1: DCalcFactors) (fs2: DCalcFactors) : DCalcFactors :=
  applyMulAux fs1 fs2

If this is more or less correct, is this strategy explained in the lean4 doc somewhere?

Nicolas Rouquette (Jan 31 2022 at 00:41):

Mario Carneiro said:

For (2) your best bet is to put the functions in mathlib4. Most of us also have a for_mathlib file or directory in our projects containing definitions and lemmas that aren't really project specific and really ought to be PR'd to mathlib but we haven't got around to it yet

Interesting, where can I find an example of a for_mathlib folder?

I understand that proving properties of functions depends a great deal on the proofs already available about the data structures involved in such functions. In my case, HashMap seems very thin w.r.t proofs in lean4 compared to has_map in Lean3's mathlib.
With Lean4, I am exploring using BinaryHeap; however, I probably will need some extensions like contains.
I would like to follow your suggestion and test such extensions in my application before suggesting a PR.

Nicolas.

Mario Carneiro (Jan 31 2022 at 01:53):

The two mechanisms you mention are not mutually exclusive. You often need (2) in order to actually do the proof of (1)

Mario Carneiro (Jan 31 2022 at 01:58):

In this case you don't need it, but conceivably the way you recurse depends on the value you get back, in which case you have to know some property of the value. For example in heapifyDown we need to know that the returned heap has the same size as the original so that subsequent accesses to the heap don't go out of bounds

Mario Carneiro (Jan 31 2022 at 02:01):

I understand that proving properties of functions depends a great deal on the proofs already available about the data structures involved in such functions. In my case, HashMap seems very thin w.r.t proofs in lean4 compared to has_map in Lean3's mathlib.

Yes, I started looking into this and it is more than a few hours of work to add all the missing lemmas for hashmaps, so I decided not to get into it today. Some things might not even be provable, because of the use of constants like UInt32.toUSize in the implementation

Mario Carneiro (Jan 31 2022 at 02:03):

With Lean4, I am exploring using BinaryHeap; however, I probably will need some extensions like contains.
I would like to follow your suggestion and test such extensions in my application before suggesting a PR.

Contains? Binary heaps don't really support membership queries faster than linear search. If you want to do linear search you can do so by using heap.1 to access the underlying array

Kevin Buzzard (Jan 31 2022 at 07:40):

https://github.com/leanprover-community/lean-perfectoid-spaces/tree/master/src/for_mathlib is a for_mathlib directory FWIW

Nicolas Rouquette (Feb 01 2022 at 03:49):

Kevin Buzzard said:

https://github.com/leanprover-community/lean-perfectoid-spaces/tree/master/src/for_mathlib is a for_mathlib directory FWIW

Ah yes, though this is based on the lean3 code organization. For lean4, I am looking for lake-based configurations; I think I found two interesting examples for potential additions to mathlib4 and to lean4's library respectively:

https://github.com/lecopivo/SciLean/tree/master/SciLean/Mathlib
https://github.com/gebner/quote4/tree/master/Qq/ForLean

Nicolas Rouquette (Feb 01 2022 at 05:18):

Mario Carneiro said:

With Lean4, I am exploring using BinaryHeap; however, I probably will need some extensions like contains.
I would like to follow your suggestion and test such extensions in my application before suggesting a PR.

Contains? Binary heaps don't really support membership queries faster than linear search. If you want to do linear search you can do so by using heap.1 to access the underlying array

Right, I think you meant something like this:

import Mathlib.Data.BinaryHeap

namespace BinaryHeap

def contains {lt} [BEq α] (self : BinaryHeap α lt) (x : α) : Bool :=
  self.arr.contains x

end BinaryHeap

Mario Carneiro (Feb 01 2022 at 08:47):

Right. (FYI you can use self.1 in place of self.arr there, which is what I was alluding to)

Nicolas Rouquette (Feb 02 2022 at 05:27):

Mario Carneiro said:

Right. (FYI you can use self.1 in place of self.arr there, which is what I was alluding to)

Wow.. I did not realize that lean4 supports an index-based reference for class/structure fields!
I think this should be explained in the doc (https://leanprover.github.io/lean4/doc/struct.html)
Separately, can you explain how useful this trick has been to your lean4 programming practice?

Mario Carneiro (Feb 02 2022 at 05:43):

I use it all the time in lean 3, it saves me from having to look up the names and it's shorter too

Mario Carneiro (Feb 02 2022 at 05:44):

Since BinaryHeap is a newtype it's more natural to just use indexing notation since the field name is irrelevant

Last updated: May 02 2025 at 03:31 UTC