Zulip Chat Archive
Stream: new members
Topic: Dictionary datatype with `ToExpr`
Youngju Song (Jan 29 2025 at 05:30):
I have been using Std.HashMap K V
as a "default choice" when it comes to dictionary data structure.
Today, I encountered a case where I need a dictionary with ToExpr
instance, but Std.HashMap
doesn't seem to provide it.
deriving instance ToExpr for Std.HashMap -- this fails
Would it be possible to derive ToExpr
for Std.HashMap
easily? I tried something like follows quickly, but it fails for unknown reason:
deriving instance ToExpr for List
deriving instance ToExpr for Prod
instance [ToExpr K] [ToExpr V] [BEq K] [Hashable K] : ToExpr (Std.HashMap K V) := {
toExpr := λ x =>
let e := x.toList |> toExpr
(const `Std.HashMap.toList [Level.zero]) |>.app e
toTypeExpr := (const `Std.HashMap [Level.zero]) |>.app (toTypeExpr (α := K)) |>.app (toTypeExpr (α := V))
}
def boo := [(0,10), (2,30), (4,50)]
def foo: Std.HashMap Nat Nat := Std.HashMap.ofList boo
#eval (Std.HashMap.toList foo)
#eval (foo |> toExpr) -- fails to find typeclass instance but why??
.
.
If defining ToExpr
for Std.HashMap
is not straightforward, I am looking for an alternative dictionary data structure which
- is traversable (which excludes
K → Option V
) - comes with
ToExpr
- easily serializable to/deserializable from
String
I can compromise computational efficiency.
I can think of something like Lean.AssocList
or K → Option V
together with its domain, would there be better choices?
Asei Inoue (Jan 29 2025 at 13:49):
I'm interested in this, too.
Henrik Böving (Jan 29 2025 at 13:54):
If you want to serialize and deserialize from string why are you looking into ToExpr
?
Youngju Song (Jan 29 2025 at 18:57):
@Henrik Böving Because I am using it inside my elaborator, and (I think) term elaborators should return Expr
at the end of the day.
Henrik Böving (Jan 29 2025 at 18:58):
so why are you looking into serialisation from and to strings then?
Youngju Song (Jan 29 2025 at 19:04):
@Henrik Böving The elaborator returns this dictionary which gets used in other parts of my program, and in this "other parts" I need serialization/deserialization. Thinking again about it, being "traversable" kind of subsumes this constraint - for whatever dictionary type Dict
that supports toList/ofList
, I can serialize/deserialize using these that.
Henrik Böving (Jan 29 2025 at 19:05):
That sounds really hacky to me, can you give an example of what you are trying to achieve?
Kyle Miller (Jan 29 2025 at 19:08):
I expect you'd be better served by docs#Lean.ToJson
Kyle Miller (Jan 29 2025 at 19:10):
Using ToExpr
is like trying to do serialization/deserializing by turning data into executable programs that the receiver has to run. That can't be the best way! (Plus it has massive security implications if you're not in total control of the data.)
Youngju Song (Jan 29 2025 at 19:38):
@Henrik Böving If serialization/deserialization part sounds hacky to you, please forget about them: this was for debugging purposes. Fyi, I ended up using AssocList
for my purpose and it worked, but I am still curious if I can stick to Std.HashMap
somehow.
And here is an example that explains why I want ToExpr
for a dictionary. I am writing a term elaborator for SMT formula that additionally "infer"s declarations. For example, given this syntax: S|x && y
, I want it to be "eventually" translated to the following smtlib2 query.
(declare-fun x () Bool)
(declare-fun y () Bool)
(assert (& x y))
(check-sat)
Note that in the example above, the type of the variables x
and y
are not explicitly specified but inferred from the formula (I know this is not always possible and I have ways to explicitly declare variables, but that is orthogonal). For this, I have wrote an elaborator that returns Q(AST * Dict Ident Ty)
(an abstract syntax tree for the formula and a dictionary for inferred declarations). This Dict Ident Ty
is then later used for generating the above smtlib2 query.
Henrik Böving (Jan 29 2025 at 19:40):
I would suggest to architecture your elaborator in such a way that the serialisation step to expr is simply not necessary in that case, I don't see why you should be forced to perform a serialisation step in order to achieve what you are trying to do
Youngju Song (Jan 29 2025 at 19:49):
@Henrik Böving To avoid possible confusion - I was using serialization only to mean serialization to String
, and please forget about them. For ToExpr
- isn't it the case that a term elaborator needs to return an Expr
at the end?
Henrik Böving (Jan 29 2025 at 19:51):
Sure but I don't see the need to keep pushing and pulling any sort of map around to achieve that
Youngju Song (Jan 29 2025 at 19:53):
@Henrik Böving I don't see what you mean, can you elaborate?
Henrik Böving (Jan 29 2025 at 19:55):
If you are writing a term elaborator that elaborator can still maintain some internal datastructures and only return one big expression for an entire term that it is working on. You can have a quotation like [smtlib| <smtstuff here>]
and write a single term elaborator that just does whatever you want and returns a final expression for your smtlib stuff in the end
Henrik Böving (Jan 29 2025 at 19:55):
But I can't be concrete unless you actually show us some code you are working on
Youngju Song (Jan 29 2025 at 20:33):
@Henrik Böving
If you are writing a term elaborator that elaborator can still maintain some internal datastructures and only return one big expression for an entire term that it is working on.
Sure. If I am returning a String
for smtlib expression directly from the elaborator in one go, I wouldn't need to expose those AST
or Dict Ident Ty
and just return Q(String)
right away. But I have procedures outside the elaborator that act on AST
and Dict Ident Ty
type (e.g., doing traversal and some basic simplification). And here Dict Ident Ty
is not just an internal data structure for an elaborator - this is a new information acquired during elaboration and gets used outside the elaborator.
Anyway, I guess this specific example could lead us to a rabbit hole. Coming back to the original question: is it the case that quoting dictionary-like structure to Expr
and returning it in the term elaboration considered a bad practice? If so, what would be the reason? (are there particular anti-patterns here?) I was thinking that it is fine to return anything (quoted to Expr
) in the term elaborator.
Henrik Böving (Jan 29 2025 at 20:34):
Well clearly serializing and deserializing a datstructure over and over again within a single system is significant overhead that should never be necessary if the system was architectured properly.
Kyle Miller (Jan 29 2025 at 20:35):
Youngju Song said:
Coming back to the original question: is it the case that quoting dictionary-like structure to
Expr
and returning it in the term elaboration considered a bad practice?
Yes, it's an anti-pattern.
Generally complex elaborators are not built out of individual term elaborators. There's a single main entry point that sets up its own elaboration strategy with its own intermediate data structures.
Youngju Song (Jan 29 2025 at 20:54):
@Henrik Böving I don't see how "quoting dictionary-like structure to Expr
" has to mean serializing/deserializing over and over again. If you are talking about my specific example above, it doesn't: there is a function elab_aux (...): TermElabM (AST × Dict Ident Ty)
that does the recursion and I call ToExpr
only once in the main entry point elab := (elab_aux ...) <&> toExpr
.
@Kyle Miller For "internal" datastructures, yes, sure. I completely agree that it never needs to quote an "internal" data structure (that gets used only inside a term elaborator) to Expr
. Just to be clear: I am not doing that. Those "internal" datastructures are set up in the "single main entry point" in my case as well. It is just that the dictionary here is genuinely not an "internal" data structure. And (from the conversation above) since it seemed like quoting a dictionary is considered evil regardless of its nature, I was curious to know why.
Kyle Miller (Jan 30 2025 at 17:17):
It's not that quoting a dictionary is an antipattern — I don't see any problem with that, provided the dictionary data is not used in the current stage. The antipattern is reinterpreting the dictionary data from the Expr.
Last updated: May 02 2025 at 03:31 UTC