Zulip Chat Archive

Stream: new members

Topic: How to encode a tree in a Std.HashMap and prove no cycles?


aron (Mar 02 2025 at 21:38):

I'd like to encode a tree in a Std.HashMap. I.e to use it to store IDs that either point to other IDs in the map or to some leaf node. Something like this:

abbrev NodeId := String

def TreeMap := Std.HashMap NodeId (Sum NodeId Int)

But I want to:

  1. guarantee that any NodeId in a value is also present in the same map as a key
  2. guarantee that there are no cycles: for any starting NodeId in the map you can follow it from key to value and back to key and it will always terminates in an .inr Int. There are no cycles of NodeIds referencing each other

Presumably I'd need to wrap the Std.HashMap in a subtype/refinement type with my two properties contained explicitly as proofs. But I don't know how I'd prove these two properties. How would you recommend I do this?

Aaron Liu (Mar 02 2025 at 22:20):

Here's a (honestly, not very good) way to do it:

import Std.Data.HashMap.Basic

abbrev NodeId := String

abbrev TreeMap := Std.HashMap NodeId (Sum NodeId Int)

def TreeMap.derivedSeries (t : TreeMap) (nid : NodeId) : Nat  Option NodeId
| 0 => some nid
| n + 1 => t.derivedSeries nid n >>= t.get? >>= Sum.getLeft?

def TreeMap.valid (t : TreeMap) : Prop :=
  ( nid h₁ h₂, (t.get nid h₁).getLeft h₂  t) 
  ( nid,  n, t.derivedSeries nid n = none)

def ValidTreeMap := Subtype TreeMap.valid

Robin Arnez (Mar 03 2025 at 08:16):

The easiest way to make a no-cycle predicate is by using an inductive because they automatically can't have cycles, e.g.:

import Std.Data.HashMap.Basic

abbrev NodeId := String
abbrev MapTree' := Std.HashMap NodeId (NodeId  Int)

inductive IsValidKey (m : MapTree') : NodeId  Prop where
  | mk (k : NodeId) (hmem : k  m) (h :  a  m[k].getLeft?, IsValidKey m a) : IsValidKey m k

def MapTree := { t : MapTree' //  k  t, IsValidKey t k }

aron (Mar 05 2025 at 14:48):

Thanks @Robin Arnez, this does encode both properties I want!

aron (Mar 05 2025 at 14:56):

But I've been finding it quite difficult to figure out the proofs for adding a new leaf or branch node... and the theorems for Std.HashMap don't seem to be quite as fleshed out as I was hoping. Any chance you could give me some tips on how to make some progress?

For example, is there a way to get t[k] = x -> k ∈ t? Doing t[k] relies on an implicit proof that k ∈ t, so it works in IsValidKey.mk because the proof is there in the form of hmem, but I don't know how to get the proof back out if I only have the proposition t[k] = x in my proof state

Robin Arnez (Mar 05 2025 at 15:01):

If you have h : t[k] = x then you can do change t[k]'?my_proof = _ and have h' := ?my_proof. What happens here is that 1. t[k]'h is notation for explicitly giving the proof and 2. change assigns the meta-variable ?my_proof (note the question mark) here which you can use later. Note that meta-variables don't really act like variables but rather like placeholders that are replaced with the actual value.

Robin Arnez (Mar 05 2025 at 15:01):

But I still find it quite impossible to believe you'd not have the proof already in some capacity.

Aaron Liu (Mar 05 2025 at 15:09):

You can use generalize_proofs (defined in file#Tactic/GeneralizeProofs)

Robin Arnez (Mar 05 2025 at 15:09):

Yeah that's another alternative if you're okay with using mathlib.

aron (Mar 05 2025 at 15:12):

mm ok those are helpful tips! I know that I'm only aware of a tiny fraction of the tactics lean offers, so suggestions like these are very useful to me!

aron (Mar 05 2025 at 15:14):

@Robin Arnez I'll try your suggestion when I get back to it. I think there were other goals I couldn't figure out but this is the one I struggled with the most

Robin Arnez (Mar 05 2025 at 19:40):

aron schrieb:

the theorems for Std.HashMap don't seem to be quite as fleshed out as I was hoping

Make sure you import Std.Data.HashMap.Lemmas, there should be a ton of lemmas.

Johannes Tantow (Mar 05 2025 at 19:53):

If you open the Std.Data.HashMap.Basic file, you can also see which operations are verified and which are currently not (inside the unverified section).

Robin Arnez (Mar 05 2025 at 20:38):

Here is a simple implementation of the addLeaf and addBranch (note: I changed the definition because your definition resulted in a tree where each node can only have one child):

import Std.Data.HashMap.Lemmas

abbrev NodeId := String

inductive Entry where
  | leaf (value : Int)
  | branch (l : NodeId) (r : NodeId)

instance : Membership NodeId Entry where
  mem
  | .leaf _, _ => False
  | .branch l r, k => k = l  k = r

@[simp] theorem Entry.mem_leaf_eq (x : Int) (k : NodeId) : (k  Entry.leaf x) = False := rfl
@[simp] theorem Entry.mem_branch_eq (l r : NodeId) (k : NodeId) : (k  Entry.branch l r) = (k = l  k = r) := rfl

abbrev MapTree' := Std.HashMap NodeId Entry

inductive IsValidKey (m : MapTree') : NodeId  Prop where
  | intro (k : NodeId) (hmem : k  m) (h :  a  m[k], IsValidKey m a) : IsValidKey m k

theorem IsValidKey.mem {m : MapTree'} {k : NodeId} : IsValidKey m k  k  m
  | ⟨_, mem, _⟩ => mem

theorem IsValidKey.insertIfNew {t : MapTree'} {k : NodeId} (h : IsValidKey t k) (k' : NodeId) (e : Entry) :
    IsValidKey (t.insertIfNew k' e) k := by
  induction h with
  | intro k hmem h' ih =>
    refine ⟨_, ?_, ?_⟩
    · simp; right; assumption
    · intro a ha
      apply ih
      simp [Std.HashMap.getElem_insertIfNew] at ha
      have : ¬(k' = k  ¬k'  t) := fun h₁, h₂ => h₂ (h₁  h)
      simpa [this] using ha

theorem IsValidKey.forall_insertIfNew {t : MapTree'} (ht :  a  t, IsValidKey t a) (k : NodeId) (e : Entry) (he :  a  e, a  t) :
     a  t.insertIfNew k e, IsValidKey (t.insertIfNew k e) a := by
  intro k'
  simp only [Std.HashMap.mem_insertIfNew, beq_iff_eq]
  by_cases h : k'  t
  · intro _
    exact (ht k' h).insertIfNew k e
  · intro h'; replace h' := h'.resolve_right h
    subst h'
    constructor
    · simp [Std.HashMap.getElem_insertIfNew, h]
      intro a ha
      exact (ht a (he a ha)).insertIfNew k e
    · simp

def MapTree := { t : MapTree' //  k  t, IsValidKey t k }

def MapTree.addLeaf (t : MapTree) (k : NodeId) (v : Int) : MapTree :=
  t.1.insertIfNew k (.leaf v), IsValidKey.forall_insertIfNew t.2 k _ (by simp)

def MapTree.addBranch (t : MapTree) (k : NodeId) (l : NodeId) (r : NodeId) (hl : l  t.1) (hl : r  t.1) : MapTree :=
  t.1.insertIfNew k (.branch l r), IsValidKey.forall_insertIfNew t.2 k _ (by simp_all)

Robin Arnez (Mar 05 2025 at 20:41):

Note that the definition here is more of an acyclic graph than a tree (in other words, one node can have multiple parents and there can be multiple roots). If you really want a "tree" tree than just use the inductive

inductive Tree where
  | leaf (v : Int)
  | branch (l : Tree) (r : Tree)

aron (Mar 09 2025 at 10:26):

@Robin Arnez amazing, that's exactly what I was looking for!

And yes you're right this is a DAG not a tree, but that's what I want so this is perfect.

Now I need to try and understand how this all works so I can adapt it to the slightly different data structure I'm actually working with...

aron (Mar 15 2025 at 20:57):

@Robin Arnez, I'm using your approach in my attempt to implement a type system. I'm doing this by using unification variables as the map keys and type constraints as the map values.

Now, I'm trying to traverse the tree map to "zonk" all the types in the map but I can't quite figure out how to prove the last missing piece: that any unification vars referenced in a value in the map are also guaranteed to be present in that map.

I've tried to cut down and simplify my code as much as I can but there's still a fair bit of context I couldn't reduce. Could you help me figure this out? :pray:

import Std.Data.HashMap
open Std





def HashMap.attach {k v : Type u} [DecidableEq k] [Hashable k] (map : HashMap k v) : List {key : k // key  map} :=
  map.keys.attach.map fun key, hkey => key, HashMap.mem_keys.mp hkey



structure TypeVar where
  index : Nat
  deriving DecidableEq

structure UniVar where
  index : Nat
  deriving Hashable, DecidableEq





inductive Primitive where
  | unit
  | string
  | int
  | bool
  | nat
  deriving DecidableEq

inductive SimpleType where
  | primitive : Primitive  SimpleType
  | typeCtor : String  List UniVar  SimpleType





instance instMembershipUniVarSimpleType : Membership UniVar SimpleType where
  mem container item :=
    match container with
    | .primitive _ => False
    | .typeCtor _ uniVars => uniVars.contains item


instance instMembershipUniVarOptionSimpleType : Membership UniVar (Option SimpleType) where
  mem container item :=
    match container with
    | .some t => item  t
    | .none => False


instance instMembershipUniVarPrimitive : Membership UniVar Primitive where
  mem _ _ := False


instance {a} : Membership UniVar (HashMap UniVar a) where
  mem container item :=
    container.contains item


instance : Membership SimpleType (Option SimpleType) where
  mem container item :=
    match container with
    | .some t => item = t
    | .none => False








@[simp] theorem SimpleType.mem_leaf_eq (x : Primitive) (k : UniVar) : (k  SimpleType.primitive x) = False := rfl

@[simp] theorem SimpleType.mem_typeCtor_eq {ctor : String} (uniVars : List UniVar) (k : UniVar) : (k  SimpleType.typeCtor ctor uniVars) = (k  uniVars) := by
  simp [instMembershipUniVarSimpleType]






abbrev SimpleTypeMap' := HashMap UniVar (Option SimpleType)







mutual

/-- The inductive predicate that a uniVar is a valid key in a `SimpleTypeMap'`, that every univar at a value in the map is also a valid key in the map, and that there are no cycles of univars looping round to point to each other! -/
inductive IsValidUniVarKey (m : SimpleTypeMap') : UniVar  Prop where
  | mk (uniVar : UniVar)
       (hmem : uniVar  m)
       (h :  t  m[uniVar]'hmem, ContainsValidUniVars m t) : IsValidUniVarKey m uniVar


/-- The predicate that a type is in the map and that all univars in that type are also in the map -/
inductive ContainsValidUniVars (m : SimpleTypeMap') : SimpleType  Prop where
  | mk (t : SimpleType)
       (h :  uv  t, IsValidUniVarKey m uv) : ContainsValidUniVars m t

end





def ContainsValidUniVars.mkFromPrimitive (m : SimpleTypeMap') (p : Primitive) : ContainsValidUniVars m (SimpleType.primitive p) := by
  refine .mk (SimpleType.primitive p) ?_
  intro uv huv
  simp only [instMembershipUniVarSimpleType] at huv


def ContainsValidUniVars.mkFromKnownPrimitive (m : SimpleTypeMap') {t : SimpleType} (p : Primitive) (h : t = SimpleType.primitive p) : ContainsValidUniVars m t := by
  rw [h]
  exact .mkFromPrimitive m p



def ContainsValidUniVars.mkFromEmptyCtor (ctor : String) {t : SimpleType} (m : SimpleTypeMap') (h : t = SimpleType.typeCtor ctor []) : ContainsValidUniVars m t := by
  rw [h]
  refine .mk (SimpleType.typeCtor ctor []) ?_
  intro uv huv
  simp [instMembershipUniVarSimpleType] at huv









@[simp] theorem SimpleType.uniVar_nin_primitive {uv : UniVar} {p : Primitive} : (uv  SimpleType.primitive p) = False := by
  simp [instMembershipUniVarSimpleType]


@[simp] theorem SimpleType.uniVar_in_typeCtor {uv : UniVar} {ctor : String} {uniVars : List UniVar} : (uv  SimpleType.typeCtor ctor uniVars) = (uv  uniVars) := by
  simp [instMembershipUniVarSimpleType]





def SimpleTypeMap := { map : SimpleTypeMap' //  uv  map, IsValidUniVarKey map uv }






inductive SimpleZonkedType where
  | primitive : Primitive  SimpleZonkedType
  | typeVar : TypeVar  SimpleZonkedType
  | typeCtor : String  List SimpleZonkedType  SimpleZonkedType


abbrev SimpleZonkedTypeMap := HashMap UniVar SimpleZonkedType









/-- We go through the map one univar key at a time, and zonk it until it can no longer be zonked. At which point by the end we should have a map of fully zonked types, which should be trivially convertible to a `SimpleZonkedTypeMap` I believe. -/
def zonkSimpleTypeMap (uniVarsMap : SimpleTypeMap) : SimpleZonkedTypeMap :=
  let firstTypeVar : TypeVar := .mk 0

  let allUniVars := uniVarsMap.1 |> HashMap.attach
  let map :=
    allUniVars.foldl (init := (HashMap.empty, firstTypeVar)) (fun (acc, accTypeVar) uv, uvInMap =>
      let isValidUniVarKey := uniVarsMap.2 uv uvInMap
      let zonked := zonkSingleUv uniVarsMap uv accTypeVar isValidUniVarKey

      (acc.insert uv zonked.1, zonked.2)
    )
  map.1
where
  zonkSingleUv (uniVarsMap : SimpleTypeMap) (uv : UniVar) (nextTypeVar : TypeVar) (uh : IsValidUniVarKey uniVarsMap.1 uv) : SimpleZonkedType × TypeVar :=
    let hmem :=
      match uh with
      | IsValidUniVarKey.mk _ hmem _ => hmem

    match hh : uniVarsMap.1[uv]'hmem with
    | none =>
      (SimpleZonkedType.typeVar nextTypeVar, { nextTypeVar with index := nextTypeVar.index + 1 })

    | some someVal =>
      by
      refine zonkSimpleType uniVarsMap someVal ?_ nextTypeVar
      cases hhh : someVal with
      | primitive p =>
        refine .mkFromKnownPrimitive uniVarsMap.1 p ?_
        rfl


      | typeCtor tag uniVars =>
        cases univarsPrf : uniVars with
        | nil =>
          refine .mkFromEmptyCtor tag uniVarsMap.1 ?_
          rewrite [univarsPrf] at hhh
          rfl

        | cons hd tl =>
          have tt :  t  uniVarsMap.1[uv], ContainsValidUniVars uniVarsMap.1 t:=
            match uh with
            | IsValidUniVarKey.mk _ _ tt => tt

          rw [ univarsPrf,  hhh]
          constructor
          intro thisUv huv
          simp [instMembershipUniVarOptionSimpleType] at huv
          sorry




  zonkSimpleType (uniVarsMap : SimpleTypeMap) (t : SimpleType) (h : ContainsValidUniVars uniVarsMap.1 t) (nextTypeVar : TypeVar) : SimpleZonkedType × TypeVar :=
    match hh : t with
    | .primitive p => (SimpleZonkedType.primitive p, nextTypeVar)
    | .typeCtor ctor uniVars =>
      let zonkedArgs :=
        uniVars.attach.foldl (init := ([], nextTypeVar))
          fun (acc, accTypeVar) thisUv, thisUvInUniVars =>
          by
          let zonked := zonkSingleUv uniVarsMap thisUv accTypeVar ?_
          refine (zonked.1 :: acc, zonked.2)
          have  _, huv  := h
          have : thisUv  SimpleType.typeCtor ctor uniVars  IsValidUniVarKey uniVarsMap.1 thisUv := huv thisUv
          simp [SimpleType.uniVar_in_typeCtor] at this
          exact this thisUvInUniVars

      (SimpleZonkedType.typeCtor ctor zonkedArgs.1, zonkedArgs.2)

I've put a sorry at the location in the code I couldn't work out how to prove

aron (Mar 15 2025 at 21:00):

The actual error that Lean gives me is a fail to show termination error but I suspect that if I can actually finish the proof, that Lean will figure out that the mutually recursive calls of zonkSimpleTypeMap.zonkSingleUv and zonkSimpleTypeMap.zonkSimpleType will terminate due to structural recursion on IsValidUniVarKey and ContainsValidUniVars

Robin Arnez (Mar 16 2025 at 08:10):

This works:

New code

aron (Mar 16 2025 at 16:11):

@Robin Arnez that does seem to work! thank you!

But my actual code is slightly different and I'm getting an error on line 99 of your code snippet that I can't figure out how to solve. In my code the type of IsValidUniVarKey.children is

IsValidUniVarKey.children {m u}
  (h : IsValidUniVarKey m u)
  :  t  m.outerMap[u]'h.mem |>.2,  uv  t, IsValidUniVarKey m uv

I don't think the change from m to m.outerMap is an issue, because I can just replace that and I think it's all the same. But I think the change from ∀ t ∈ <expr> to ∀ t ∈ <expr> |>.2 is the cause of the problems I'm getting. The issue I get is

application type mismatch
  IsValidUniVarKey.children uh ?m.34287 ?m.35004 thisUv hh''
argument
  hh''
has type
  thisUv  uniVars : Prop
but is expected to have type
  thisUv  ?m.34287 : Prop

and when I replace hh'' with a placeholder ?_ in a refine tactic, then I get an error about hh also:

application type mismatch
  IsValidUniVarKey.children uh ?m.34287 hh
argument
  hh
has type
  (outerMap uniVarsMap)[uv].2 = some (SimpleType.typeCtor ctor uniVars) : Prop
but is expected to have type
  ?m.34287  (outerMap uniVarsMap)[uv].2 : Prop

I can't figure out how to solve this issue on my own because the types of hh and hh'' that we're passing to uh.children don't seem to match its expected parameters so I don't know how to adapt the proofs accordingly. E.g. we're passing hh as the 2nd param, but when I hover over uh.children it tells me that its 2nd (non-implicit) param is (t : SimpleType), not uniVarsMap.1.outerMap[uv].2 = some (SimpleType.typeCtor ctor uniVars) which is the type of hh.

And in your code snippet, uh.children's 4th param seems to be t ∈ m[u], but we're passing it hh'' whose type is thisUv ∈ uniVars. How does that match the expected type? Is lean automatically coercing thisUv ∈ uniVars to t ∈ m[u] because it knows that thisUv ∈ uniVars iff t ∈ m[u]? If so, how do I tell lean that in my code likewise thisUv ∈ uniVars iff t ∈ m.outerMap[u].2?

aron (Mar 16 2025 at 17:15):

Ok I've simplified your snippet to change the h proof inside IsValidUniVarKey from:

 t  m[uniVar],  uv  t, IsValidUniVarKey m uv

to

 uv  m[uniVar], IsValidUniVarKey m uv

Simplified code

I did this because t ∈ m[uniVar] is unnecessary, since m[uniVar] doesn't contain a t, it is a t, so hopefully after I do this simplification in my own code it'll be clearer to me how to provide the missing proofs to uh.children

Robin Arnez (Mar 16 2025 at 17:50):

m[...] was an Option, wasn't it? DON'T abuse membership instances like that. Every types should have exactly one type that it "contains" or you're gonna get all sorts of problems. That's why I removed pretty much all instances in my version as well.

Robin Arnez (Mar 16 2025 at 17:50):

Option T already has a canoncial membership of type T, so adding another will cause problems

aron (Mar 16 2025 at 18:18):

I don't think I'm overloading membership instances in my current code?

These are the only ones I've got atm (live lean link):

instance : Membership UniVar SimpleType where
  mem container item :=
    match container with
    | .primitive _ => False
    | .typeCtor _ uniVars => uniVars.contains item

instance : Membership UniVar (Option SimpleType) where
  mem container item :=
    match container with
    | .some t => item  t
    | .none => False

instance : Membership UniVar Primitive where
  mem _ _ := False

instance {a} : Membership UniVar (HashMap UniVar a) where
  mem container item :=
    container.contains item

instance {a} : Membership UniVar (a × Option SimpleType) where
  mem container item := item  container.2

I don't think there's any ambiguity in these instances, is there?

although I see what you mean about Option T already having a membership instance so I don't need to make my own. I'll just remove that one then. My instance doesn't really do anything funky with it anyway so I don't think that would've messed anything up :man_shrugging:

aron (Mar 16 2025 at 18:20):

either way I don't think the membership instances are what caused my adaptation of your code to not work

Aaron Liu (Mar 16 2025 at 18:32):

Membership UniVar (Option SimpleType) overloads Membership SimpleType (Option SimpleType) (from docs#Option.instMembership).

aron (Mar 16 2025 at 18:47):

Oh hmm I see. Tbh this feel like a strange restriction to me. I'd have expected that if I have a membership instance of an A in a B, and the generic Option t has its own membership instance, then the membership of A in a Option B should just work transitively the same way that A in B works

aron (Mar 16 2025 at 18:52):

I think my expectation would work fine if you could add a type annotation on a forall membership declaration, but since you can't you have to rely on whatever lean infers as the member item's type, which yeah, doesn't really play nicely with what I want to do here :thinking:

Kevin Buzzard (Mar 16 2025 at 19:10):

Yeah docs#Membership is a class and alpha (the "element" type) is an outparam so this instance isn't going to go well with our current setup.

Kevin Buzzard (Mar 16 2025 at 19:13):

To be honest I personally think that Option.instMembership is a daft instance, my mental model of Option isn't "the empty set and then a bunch of one-element sets", it's "the original type plus a new term". But maybe this notation is standard in some areas? How much is this used?

Robin Arnez (Mar 16 2025 at 19:24):

Well, not sure how it's used but I feel like the membership instance makes sense if you think about the connection between List and Option with map, filter and others. I mean I guess there are a few ways to think about a type with one additional element, a list/set with at most one element just being one of them. Still, the current api doesn't really lean into the analogy, e.g. EmptyCollection and Singleton (in theory Subset / Inter) don't exist and membership is mostly just simplified back into a equality.

aron (Mar 16 2025 at 19:31):

I think my expectation would work fine if you could add a type annotation on a forall membership declaration, but since you can't you have to rely on whatever lean infers as the member item's type

oh I found a cheeky way to do this actually, because by looking at the code for the notation I realised that ∀ x ∈ m, Pred x m is actually syntax sugar for ∀ x, x ∈ m → Pred x m, and in that syntax you can just add a type annotation on x:

 (x : T), x  m  Pred x m

(sorry @Robin Arnez :grimacing:)

Robin Arnez (Mar 16 2025 at 19:35):

aron schrieb:

Ok I've simplified your snippet to change the h proof inside IsValidUniVarKey from:

 t  m[uniVar],  uv  t, IsValidUniVarKey m uv

to

∀ uv ∈ m[uniVar], IsValidUniVarKey m uv

Is that really a simplification though? I feel like you just have more effort with the proof in zonkSingleUv

aron (Mar 16 2025 at 21:54):

well atm I can't adapt the proof to my code at all so idk if it would make a difference either way :sob:


Last updated: May 02 2025 at 03:31 UTC