Zulip Chat Archive
Stream: new members
Topic: How to encode a tree in a Std.HashMap and prove no cycles?
aron (Mar 02 2025 at 21:38):
I'd like to encode a tree in a Std.HashMap
. I.e to use it to store IDs that either point to other IDs in the map or to some leaf node. Something like this:
abbrev NodeId := String
def TreeMap := Std.HashMap NodeId (Sum NodeId Int)
But I want to:
- guarantee that any
NodeId
in a value is also present in the same map as a key - guarantee that there are no cycles: for any starting
NodeId
in the map you can follow it from key to value and back to key and it will always terminates in an.inr Int
. There are no cycles ofNodeId
s referencing each other
Presumably I'd need to wrap the Std.HashMap
in a subtype/refinement type with my two properties contained explicitly as proofs. But I don't know how I'd prove these two properties. How would you recommend I do this?
Aaron Liu (Mar 02 2025 at 22:20):
Here's a (honestly, not very good) way to do it:
import Std.Data.HashMap.Basic
abbrev NodeId := String
abbrev TreeMap := Std.HashMap NodeId (Sum NodeId Int)
def TreeMap.derivedSeries (t : TreeMap) (nid : NodeId) : Nat → Option NodeId
| 0 => some nid
| n + 1 => t.derivedSeries nid n >>= t.get? >>= Sum.getLeft?
def TreeMap.valid (t : TreeMap) : Prop :=
(∀ nid h₁ h₂, (t.get nid h₁).getLeft h₂ ∈ t) ∧
(∀ nid, ∃ n, t.derivedSeries nid n = none)
def ValidTreeMap := Subtype TreeMap.valid
Robin Arnez (Mar 03 2025 at 08:16):
The easiest way to make a no-cycle predicate is by using an inductive because they automatically can't have cycles, e.g.:
import Std.Data.HashMap.Basic
abbrev NodeId := String
abbrev MapTree' := Std.HashMap NodeId (NodeId ⊕ Int)
inductive IsValidKey (m : MapTree') : NodeId → Prop where
| mk (k : NodeId) (hmem : k ∈ m) (h : ∀ a ∈ m[k].getLeft?, IsValidKey m a) : IsValidKey m k
def MapTree := { t : MapTree' // ∀ k ∈ t, IsValidKey t k }
aron (Mar 05 2025 at 14:48):
Thanks @Robin Arnez, this does encode both properties I want!
aron (Mar 05 2025 at 14:56):
But I've been finding it quite difficult to figure out the proofs for adding a new leaf or branch node... and the theorems for Std.HashMap
don't seem to be quite as fleshed out as I was hoping. Any chance you could give me some tips on how to make some progress?
For example, is there a way to get t[k] = x -> k ∈ t
? Doing t[k]
relies on an implicit proof that k ∈ t
, so it works in IsValidKey.mk
because the proof is there in the form of hmem
, but I don't know how to get the proof back out if I only have the proposition t[k] = x
in my proof state
Robin Arnez (Mar 05 2025 at 15:01):
If you have h : t[k] = x
then you can do change t[k]'?my_proof = _
and have h' := ?my_proof
. What happens here is that 1. t[k]'h
is notation for explicitly giving the proof and 2. change
assigns the meta-variable ?my_proof
(note the question mark) here which you can use later. Note that meta-variables don't really act like variables but rather like placeholders that are replaced with the actual value.
Robin Arnez (Mar 05 2025 at 15:01):
But I still find it quite impossible to believe you'd not have the proof already in some capacity.
Aaron Liu (Mar 05 2025 at 15:09):
You can use generalize_proofs
(defined in file#Tactic/GeneralizeProofs)
Robin Arnez (Mar 05 2025 at 15:09):
Yeah that's another alternative if you're okay with using mathlib.
aron (Mar 05 2025 at 15:12):
mm ok those are helpful tips! I know that I'm only aware of a tiny fraction of the tactics lean offers, so suggestions like these are very useful to me!
aron (Mar 05 2025 at 15:14):
@Robin Arnez I'll try your suggestion when I get back to it. I think there were other goals I couldn't figure out but this is the one I struggled with the most
Robin Arnez (Mar 05 2025 at 19:40):
aron schrieb:
the theorems for
Std.HashMap
don't seem to be quite as fleshed out as I was hoping
Make sure you import Std.Data.HashMap.Lemmas
, there should be a ton of lemmas.
Johannes Tantow (Mar 05 2025 at 19:53):
If you open the Std.Data.HashMap.Basic
file, you can also see which operations are verified and which are currently not (inside the unverified section).
Robin Arnez (Mar 05 2025 at 20:38):
Here is a simple implementation of the addLeaf
and addBranch
(note: I changed the definition because your definition resulted in a tree where each node can only have one child):
import Std.Data.HashMap.Lemmas
abbrev NodeId := String
inductive Entry where
| leaf (value : Int)
| branch (l : NodeId) (r : NodeId)
instance : Membership NodeId Entry where
mem
| .leaf _, _ => False
| .branch l r, k => k = l ∨ k = r
@[simp] theorem Entry.mem_leaf_eq (x : Int) (k : NodeId) : (k ∈ Entry.leaf x) = False := rfl
@[simp] theorem Entry.mem_branch_eq (l r : NodeId) (k : NodeId) : (k ∈ Entry.branch l r) = (k = l ∨ k = r) := rfl
abbrev MapTree' := Std.HashMap NodeId Entry
inductive IsValidKey (m : MapTree') : NodeId → Prop where
| intro (k : NodeId) (hmem : k ∈ m) (h : ∀ a ∈ m[k], IsValidKey m a) : IsValidKey m k
theorem IsValidKey.mem {m : MapTree'} {k : NodeId} : IsValidKey m k → k ∈ m
| ⟨_, mem, _⟩ => mem
theorem IsValidKey.insertIfNew {t : MapTree'} {k : NodeId} (h : IsValidKey t k) (k' : NodeId) (e : Entry) :
IsValidKey (t.insertIfNew k' e) k := by
induction h with
| intro k hmem h' ih =>
refine ⟨_, ?_, ?_⟩
· simp; right; assumption
· intro a ha
apply ih
simp [Std.HashMap.getElem_insertIfNew] at ha
have : ¬(k' = k ∧ ¬k' ∈ t) := fun ⟨h₁, h₂⟩ => h₂ (h₁ ▸ h)
simpa [this] using ha
theorem IsValidKey.forall_insertIfNew {t : MapTree'} (ht : ∀ a ∈ t, IsValidKey t a) (k : NodeId) (e : Entry) (he : ∀ a ∈ e, a ∈ t) :
∀ a ∈ t.insertIfNew k e, IsValidKey (t.insertIfNew k e) a := by
intro k'
simp only [Std.HashMap.mem_insertIfNew, beq_iff_eq]
by_cases h : k' ∈ t
· intro _
exact (ht k' h).insertIfNew k e
· intro h'; replace h' := h'.resolve_right h
subst h'
constructor
· simp [Std.HashMap.getElem_insertIfNew, h]
intro a ha
exact (ht a (he a ha)).insertIfNew k e
· simp
def MapTree := { t : MapTree' // ∀ k ∈ t, IsValidKey t k }
def MapTree.addLeaf (t : MapTree) (k : NodeId) (v : Int) : MapTree :=
⟨t.1.insertIfNew k (.leaf v), IsValidKey.forall_insertIfNew t.2 k _ (by simp)⟩
def MapTree.addBranch (t : MapTree) (k : NodeId) (l : NodeId) (r : NodeId) (hl : l ∈ t.1) (hl : r ∈ t.1) : MapTree :=
⟨t.1.insertIfNew k (.branch l r), IsValidKey.forall_insertIfNew t.2 k _ (by simp_all)⟩
Robin Arnez (Mar 05 2025 at 20:41):
Note that the definition here is more of an acyclic graph than a tree (in other words, one node can have multiple parents and there can be multiple roots). If you really want a "tree" tree than just use the inductive
inductive Tree where
| leaf (v : Int)
| branch (l : Tree) (r : Tree)
aron (Mar 09 2025 at 10:26):
@Robin Arnez amazing, that's exactly what I was looking for!
And yes you're right this is a DAG not a tree, but that's what I want so this is perfect.
Now I need to try and understand how this all works so I can adapt it to the slightly different data structure I'm actually working with...
aron (Mar 15 2025 at 20:57):
@Robin Arnez, I'm using your approach in my attempt to implement a type system. I'm doing this by using unification variables as the map keys and type constraints as the map values.
Now, I'm trying to traverse the tree map to "zonk" all the types in the map but I can't quite figure out how to prove the last missing piece: that any unification vars referenced in a value in the map are also guaranteed to be present in that map.
I've tried to cut down and simplify my code as much as I can but there's still a fair bit of context I couldn't reduce. Could you help me figure this out? :pray:
import Std.Data.HashMap
open Std
def HashMap.attach {k v : Type u} [DecidableEq k] [Hashable k] (map : HashMap k v) : List {key : k // key ∈ map} :=
map.keys.attach.map fun ⟨key, hkey⟩ => ⟨key, HashMap.mem_keys.mp hkey⟩
structure TypeVar where
index : Nat
deriving DecidableEq
structure UniVar where
index : Nat
deriving Hashable, DecidableEq
inductive Primitive where
| unit
| string
| int
| bool
| nat
deriving DecidableEq
inductive SimpleType where
| primitive : Primitive → SimpleType
| typeCtor : String → List UniVar → SimpleType
instance instMembershipUniVarSimpleType : Membership UniVar SimpleType where
mem container item :=
match container with
| .primitive _ => False
| .typeCtor _ uniVars => uniVars.contains item
instance instMembershipUniVarOptionSimpleType : Membership UniVar (Option SimpleType) where
mem container item :=
match container with
| .some t => item ∈ t
| .none => False
instance instMembershipUniVarPrimitive : Membership UniVar Primitive where
mem _ _ := False
instance {a} : Membership UniVar (HashMap UniVar a) where
mem container item :=
container.contains item
instance : Membership SimpleType (Option SimpleType) where
mem container item :=
match container with
| .some t => item = t
| .none => False
@[simp] theorem SimpleType.mem_leaf_eq (x : Primitive) (k : UniVar) : (k ∈ SimpleType.primitive x) = False := rfl
@[simp] theorem SimpleType.mem_typeCtor_eq {ctor : String} (uniVars : List UniVar) (k : UniVar) : (k ∈ SimpleType.typeCtor ctor uniVars) = (k ∈ uniVars) := by
simp [instMembershipUniVarSimpleType]
abbrev SimpleTypeMap' := HashMap UniVar (Option SimpleType)
mutual
/-- The inductive predicate that a uniVar is a valid key in a `SimpleTypeMap'`, that every univar at a value in the map is also a valid key in the map, and that there are no cycles of univars looping round to point to each other! -/
inductive IsValidUniVarKey (m : SimpleTypeMap') : UniVar → Prop where
| mk (uniVar : UniVar)
(hmem : uniVar ∈ m)
(h : ∀ t ∈ m[uniVar]'hmem, ContainsValidUniVars m t) : IsValidUniVarKey m uniVar
/-- The predicate that a type is in the map and that all univars in that type are also in the map -/
inductive ContainsValidUniVars (m : SimpleTypeMap') : SimpleType → Prop where
| mk (t : SimpleType)
(h : ∀ uv ∈ t, IsValidUniVarKey m uv) : ContainsValidUniVars m t
end
def ContainsValidUniVars.mkFromPrimitive (m : SimpleTypeMap') (p : Primitive) : ContainsValidUniVars m (SimpleType.primitive p) := by
refine .mk (SimpleType.primitive p) ?_
intro uv huv
simp only [instMembershipUniVarSimpleType] at huv
def ContainsValidUniVars.mkFromKnownPrimitive (m : SimpleTypeMap') {t : SimpleType} (p : Primitive) (h : t = SimpleType.primitive p) : ContainsValidUniVars m t := by
rw [h]
exact .mkFromPrimitive m p
def ContainsValidUniVars.mkFromEmptyCtor (ctor : String) {t : SimpleType} (m : SimpleTypeMap') (h : t = SimpleType.typeCtor ctor []) : ContainsValidUniVars m t := by
rw [h]
refine .mk (SimpleType.typeCtor ctor []) ?_
intro uv huv
simp [instMembershipUniVarSimpleType] at huv
@[simp] theorem SimpleType.uniVar_nin_primitive {uv : UniVar} {p : Primitive} : (uv ∈ SimpleType.primitive p) = False := by
simp [instMembershipUniVarSimpleType]
@[simp] theorem SimpleType.uniVar_in_typeCtor {uv : UniVar} {ctor : String} {uniVars : List UniVar} : (uv ∈ SimpleType.typeCtor ctor uniVars) = (uv ∈ uniVars) := by
simp [instMembershipUniVarSimpleType]
def SimpleTypeMap := { map : SimpleTypeMap' // ∀ uv ∈ map, IsValidUniVarKey map uv }
inductive SimpleZonkedType where
| primitive : Primitive → SimpleZonkedType
| typeVar : TypeVar → SimpleZonkedType
| typeCtor : String → List SimpleZonkedType → SimpleZonkedType
abbrev SimpleZonkedTypeMap := HashMap UniVar SimpleZonkedType
/-- We go through the map one univar key at a time, and zonk it until it can no longer be zonked. At which point by the end we should have a map of fully zonked types, which should be trivially convertible to a `SimpleZonkedTypeMap` I believe. -/
def zonkSimpleTypeMap (uniVarsMap : SimpleTypeMap) : SimpleZonkedTypeMap :=
let firstTypeVar : TypeVar := .mk 0
let allUniVars := uniVarsMap.1 |> HashMap.attach
let map :=
allUniVars.foldl (init := (HashMap.empty, firstTypeVar)) (fun (acc, accTypeVar) ⟨uv, uvInMap⟩ =>
let isValidUniVarKey := uniVarsMap.2 uv uvInMap
let zonked := zonkSingleUv uniVarsMap uv accTypeVar isValidUniVarKey
(acc.insert uv zonked.1, zonked.2)
)
map.1
where
zonkSingleUv (uniVarsMap : SimpleTypeMap) (uv : UniVar) (nextTypeVar : TypeVar) (uh : IsValidUniVarKey uniVarsMap.1 uv) : SimpleZonkedType × TypeVar :=
let hmem :=
match uh with
| IsValidUniVarKey.mk _ hmem _ => hmem
match hh : uniVarsMap.1[uv]'hmem with
| none =>
(SimpleZonkedType.typeVar nextTypeVar, { nextTypeVar with index := nextTypeVar.index + 1 })
| some someVal =>
by
refine zonkSimpleType uniVarsMap someVal ?_ nextTypeVar
cases hhh : someVal with
| primitive p =>
refine .mkFromKnownPrimitive uniVarsMap.1 p ?_
rfl
| typeCtor tag uniVars =>
cases univarsPrf : uniVars with
| nil =>
refine .mkFromEmptyCtor tag uniVarsMap.1 ?_
rewrite [univarsPrf] at hhh
rfl
| cons hd tl =>
have tt : ∀ t ∈ uniVarsMap.1[uv], ContainsValidUniVars uniVarsMap.1 t:=
match uh with
| IsValidUniVarKey.mk _ _ tt => tt
rw [← univarsPrf, ← hhh]
constructor
intro thisUv huv
simp [instMembershipUniVarOptionSimpleType] at huv
sorry
zonkSimpleType (uniVarsMap : SimpleTypeMap) (t : SimpleType) (h : ContainsValidUniVars uniVarsMap.1 t) (nextTypeVar : TypeVar) : SimpleZonkedType × TypeVar :=
match hh : t with
| .primitive p => (SimpleZonkedType.primitive p, nextTypeVar)
| .typeCtor ctor uniVars =>
let zonkedArgs :=
uniVars.attach.foldl (init := ([], nextTypeVar))
fun (acc, accTypeVar) ⟨thisUv, thisUvInUniVars⟩ =>
by
let zonked := zonkSingleUv uniVarsMap thisUv accTypeVar ?_
refine (zonked.1 :: acc, zonked.2)
have ⟨ _, huv ⟩ := h
have : thisUv ∈ SimpleType.typeCtor ctor uniVars → IsValidUniVarKey uniVarsMap.1 thisUv := huv thisUv
simp [SimpleType.uniVar_in_typeCtor] at this
exact this thisUvInUniVars
(SimpleZonkedType.typeCtor ctor zonkedArgs.1, zonkedArgs.2)
I've put a sorry
at the location in the code I couldn't work out how to prove
aron (Mar 15 2025 at 21:00):
The actual error that Lean gives me is a fail to show termination
error but I suspect that if I can actually finish the proof, that Lean will figure out that the mutually recursive calls of zonkSimpleTypeMap.zonkSingleUv
and zonkSimpleTypeMap.zonkSimpleType
will terminate due to structural recursion on IsValidUniVarKey
and ContainsValidUniVars
Robin Arnez (Mar 16 2025 at 08:10):
This works:
New code
aron (Mar 16 2025 at 16:11):
@Robin Arnez that does seem to work! thank you!
But my actual code is slightly different and I'm getting an error on line 99 of your code snippet that I can't figure out how to solve. In my code the type of IsValidUniVarKey.children
is
IsValidUniVarKey.children {m u}
(h : IsValidUniVarKey m u)
: ∀ t ∈ m.outerMap[u]'h.mem |>.2, ∀ uv ∈ t, IsValidUniVarKey m uv
I don't think the change from m
to m.outerMap
is an issue, because I can just replace that and I think it's all the same. But I think the change from ∀ t ∈ <expr>
to ∀ t ∈ <expr> |>.2
is the cause of the problems I'm getting. The issue I get is
application type mismatch
IsValidUniVarKey.children uh ?m.34287 ?m.35004 thisUv hh''
argument
hh''
has type
thisUv ∈ uniVars : Prop
but is expected to have type
thisUv ∈ ?m.34287 : Prop
and when I replace hh''
with a placeholder ?_
in a refine
tactic, then I get an error about hh
also:
application type mismatch
IsValidUniVarKey.children uh ?m.34287 hh
argument
hh
has type
(outerMap ↑uniVarsMap)[uv].2 = some (SimpleType.typeCtor ctor uniVars) : Prop
but is expected to have type
?m.34287 ∈ (outerMap ↑uniVarsMap)[uv].2 : Prop
I can't figure out how to solve this issue on my own because the types of hh
and hh''
that we're passing to uh.children
don't seem to match its expected parameters so I don't know how to adapt the proofs accordingly. E.g. we're passing hh
as the 2nd param, but when I hover over uh.children
it tells me that its 2nd (non-implicit) param is (t : SimpleType)
, not uniVarsMap.1.outerMap[uv].2 = some (SimpleType.typeCtor ctor uniVars)
which is the type of hh
.
And in your code snippet, uh.children
's 4th param seems to be t ∈ m[u]
, but we're passing it hh''
whose type is thisUv ∈ uniVars
. How does that match the expected type? Is lean automatically coercing thisUv ∈ uniVars
to t ∈ m[u]
because it knows that thisUv ∈ uniVars
iff t ∈ m[u]
? If so, how do I tell lean that in my code likewise thisUv ∈ uniVars
iff t ∈ m.outerMap[u].2
?
aron (Mar 16 2025 at 17:15):
Ok I've simplified your snippet to change the h
proof inside IsValidUniVarKey
from:
∀ t ∈ m[uniVar], ∀ uv ∈ t, IsValidUniVarKey m uv
to
∀ uv ∈ m[uniVar], IsValidUniVarKey m uv
Simplified code
I did this because t ∈ m[uniVar]
is unnecessary, since m[uniVar]
doesn't contain a t
, it is a t
, so hopefully after I do this simplification in my own code it'll be clearer to me how to provide the missing proofs to uh.children
Robin Arnez (Mar 16 2025 at 17:50):
m[...]
was an Option
, wasn't it? DON'T abuse membership instances like that. Every types should have exactly one type that it "contains" or you're gonna get all sorts of problems. That's why I removed pretty much all instances in my version as well.
Robin Arnez (Mar 16 2025 at 17:50):
Option T
already has a canoncial membership of type T
, so adding another will cause problems
aron (Mar 16 2025 at 18:18):
I don't think I'm overloading membership instances in my current code?
These are the only ones I've got atm (live lean link):
instance : Membership UniVar SimpleType where
mem container item :=
match container with
| .primitive _ => False
| .typeCtor _ uniVars => uniVars.contains item
instance : Membership UniVar (Option SimpleType) where
mem container item :=
match container with
| .some t => item ∈ t
| .none => False
instance : Membership UniVar Primitive where
mem _ _ := False
instance {a} : Membership UniVar (HashMap UniVar a) where
mem container item :=
container.contains item
instance {a} : Membership UniVar (a × Option SimpleType) where
mem container item := item ∈ container.2
I don't think there's any ambiguity in these instances, is there?
although I see what you mean about Option T
already having a membership instance so I don't need to make my own. I'll just remove that one then. My instance doesn't really do anything funky with it anyway so I don't think that would've messed anything up :man_shrugging:
aron (Mar 16 2025 at 18:20):
either way I don't think the membership instances are what caused my adaptation of your code to not work
Aaron Liu (Mar 16 2025 at 18:32):
Membership UniVar (Option SimpleType)
overloads Membership SimpleType (Option SimpleType)
(from docs#Option.instMembership).
aron (Mar 16 2025 at 18:47):
Oh hmm I see. Tbh this feel like a strange restriction to me. I'd have expected that if I have a membership instance of an A
in a B
, and the generic Option t
has its own membership instance, then the membership of A
in a Option B
should just work transitively the same way that A
in B
works
aron (Mar 16 2025 at 18:52):
I think my expectation would work fine if you could add a type annotation on a forall membership declaration, but since you can't you have to rely on whatever lean infers as the member item's type, which yeah, doesn't really play nicely with what I want to do here :thinking:
Kevin Buzzard (Mar 16 2025 at 19:10):
Yeah docs#Membership is a class and alpha (the "element" type) is an outparam so this instance isn't going to go well with our current setup.
Kevin Buzzard (Mar 16 2025 at 19:13):
To be honest I personally think that Option.instMembership is a daft instance, my mental model of Option
isn't "the empty set and then a bunch of one-element sets", it's "the original type plus a new term". But maybe this notation is standard in some areas? How much is this used?
Robin Arnez (Mar 16 2025 at 19:24):
Well, not sure how it's used but I feel like the membership instance makes sense if you think about the connection between List
and Option
with map
, filter
and others. I mean I guess there are a few ways to think about a type with one additional element, a list/set with at most one element just being one of them. Still, the current api doesn't really lean into the analogy, e.g. EmptyCollection
and Singleton
(in theory Subset
/ Inter
) don't exist and membership is mostly just simplified back into a equality.
aron (Mar 16 2025 at 19:31):
I think my expectation would work fine if you could add a type annotation on a forall membership declaration, but since you can't you have to rely on whatever lean infers as the member item's type
oh I found a cheeky way to do this actually, because by looking at the code for the ∀
notation I realised that ∀ x ∈ m, Pred x m
is actually syntax sugar for ∀ x, x ∈ m → Pred x m
, and in that syntax you can just add a type annotation on x
:
∀ (x : T), x ∈ m → Pred x m
(sorry @Robin Arnez :grimacing:)
Robin Arnez (Mar 16 2025 at 19:35):
aron schrieb:
Ok I've simplified your snippet to change the
h
proof insideIsValidUniVarKey
from:∀ t ∈ m[uniVar], ∀ uv ∈ t, IsValidUniVarKey m uv
to
∀ uv ∈ m[uniVar], IsValidUniVarKey m uv
Is that really a simplification though? I feel like you just have more effort with the proof in zonkSingleUv
aron (Mar 16 2025 at 21:54):
well atm I can't adapt the proof to my code at all so idk if it would make a difference either way :sob:
Last updated: May 02 2025 at 03:31 UTC