Zulip Chat Archive

Stream: new members

Topic: Dictionary datatype with `ToExpr`

Youngju Song (Jan 29 2025 at 05:30):

I have been using Std.HashMap K V as a "default choice" when it comes to dictionary data structure.
Today, I encountered a case where I need a dictionary with ToExpr instance, but Std.HashMap doesn't seem to provide it.

deriving instance ToExpr for Std.HashMap -- this fails

Would it be possible to derive ToExpr for Std.HashMap easily? I tried something like follows quickly, but it fails for unknown reason:

deriving instance ToExpr for List
deriving instance ToExpr for Prod

instance [ToExpr K] [ToExpr V] [BEq K] [Hashable K] : ToExpr (Std.HashMap K V) := {
  toExpr := λ x =>
    let e := x.toList |> toExpr
    (const `Std.HashMap.toList [Level.zero]) |>.app e
  toTypeExpr := (const `Std.HashMap [Level.zero]) |>.app (toTypeExpr (α := K)) |>.app (toTypeExpr (α := V))
}
def boo := [(0,10), (2,30), (4,50)]
def foo: Std.HashMap Nat Nat := Std.HashMap.ofList boo
#eval (Std.HashMap.toList foo)
#eval (foo |> toExpr) -- fails to find typeclass instance but why??

.
.

If defining ToExpr for Std.HashMap is not straightforward, I am looking for an alternative dictionary data structure which

is traversable (which excludes K → Option V)
comes with ToExpr
easily serializable to/deserializable from String

I can compromise computational efficiency.

I can think of something like Lean.AssocList or K → Option V together with its domain, would there be better choices?

Asei Inoue (Jan 29 2025 at 13:49):

I'm interested in this, too.

Henrik Böving (Jan 29 2025 at 13:54):

If you want to serialize and deserialize from string why are you looking into ToExpr?

Youngju Song (Jan 29 2025 at 18:57):

@Henrik Böving Because I am using it inside my elaborator, and (I think) term elaborators should return Expr at the end of the day.

Henrik Böving (Jan 29 2025 at 18:58):

so why are you looking into serialisation from and to strings then?

Youngju Song (Jan 29 2025 at 19:04):

@Henrik Böving The elaborator returns this dictionary which gets used in other parts of my program, and in this "other parts" I need serialization/deserialization. Thinking again about it, being "traversable" kind of subsumes this constraint - for whatever dictionary type Dict that supports toList/ofList, I can serialize/deserialize using these that.

Henrik Böving (Jan 29 2025 at 19:05):

That sounds really hacky to me, can you give an example of what you are trying to achieve?

Kyle Miller (Jan 29 2025 at 19:08):

I expect you'd be better served by docs#Lean.ToJson

Kyle Miller (Jan 29 2025 at 19:10):

Using ToExpr is like trying to do serialization/deserializing by turning data into executable programs that the receiver has to run. That can't be the best way! (Plus it has massive security implications if you're not in total control of the data.)

Youngju Song (Jan 29 2025 at 19:38):

@Henrik Böving If serialization/deserialization part sounds hacky to you, please forget about them: this was for debugging purposes. Fyi, I ended up using AssocList for my purpose and it worked, but I am still curious if I can stick to Std.HashMap somehow.

And here is an example that explains why I want ToExpr for a dictionary. I am writing a term elaborator for SMT formula that additionally "infer"s declarations. For example, given this syntax: S|x && y, I want it to be "eventually" translated to the following smtlib2 query.

(declare-fun x () Bool)
(declare-fun y () Bool)
(assert (& x y))
(check-sat)

Note that in the example above, the type of the variables x and y are not explicitly specified but inferred from the formula (I know this is not always possible and I have ways to explicitly declare variables, but that is orthogonal). For this, I have wrote an elaborator that returns Q(AST * Dict Ident Ty) (an abstract syntax tree for the formula and a dictionary for inferred declarations). This Dict Ident Ty is then later used for generating the above smtlib2 query.

Henrik Böving (Jan 29 2025 at 19:40):

I would suggest to architecture your elaborator in such a way that the serialisation step to expr is simply not necessary in that case, I don't see why you should be forced to perform a serialisation step in order to achieve what you are trying to do

Youngju Song (Jan 29 2025 at 19:49):

@Henrik Böving To avoid possible confusion - I was using serialization only to mean serialization to String, and please forget about them. For ToExpr - isn't it the case that a term elaborator needs to return an Expr at the end?

Henrik Böving (Jan 29 2025 at 19:51):

Sure but I don't see the need to keep pushing and pulling any sort of map around to achieve that

Youngju Song (Jan 29 2025 at 19:53):

@Henrik Böving I don't see what you mean, can you elaborate?

Henrik Böving (Jan 29 2025 at 19:55):

If you are writing a term elaborator that elaborator can still maintain some internal datastructures and only return one big expression for an entire term that it is working on. You can have a quotation like [smtlib| <smtstuff here>] and write a single term elaborator that just does whatever you want and returns a final expression for your smtlib stuff in the end

Henrik Böving (Jan 29 2025 at 19:55):

But I can't be concrete unless you actually show us some code you are working on

Youngju Song (Jan 29 2025 at 20:33):

@Henrik Böving

If you are writing a term elaborator that elaborator can still maintain some internal datastructures and only return one big expression for an entire term that it is working on.

Sure. If I am returning a String for smtlib expression directly from the elaborator in one go, I wouldn't need to expose those AST or Dict Ident Ty and just return Q(String) right away. But I have procedures outside the elaborator that act on AST and Dict Ident Ty type (e.g., doing traversal and some basic simplification). And here Dict Ident Ty is not just an internal data structure for an elaborator - this is a new information acquired during elaboration and gets used outside the elaborator.

Anyway, I guess this specific example could lead us to a rabbit hole. Coming back to the original question: is it the case that quoting dictionary-like structure to Expr and returning it in the term elaboration considered a bad practice? If so, what would be the reason? (are there particular anti-patterns here?) I was thinking that it is fine to return anything (quoted to Expr) in the term elaborator.

Henrik Böving (Jan 29 2025 at 20:34):

Well clearly serializing and deserializing a datstructure over and over again within a single system is significant overhead that should never be necessary if the system was architectured properly.

Kyle Miller (Jan 29 2025 at 20:35):

Youngju Song said:

Coming back to the original question: is it the case that quoting dictionary-like structure to Expr and returning it in the term elaboration considered a bad practice?

Yes, it's an anti-pattern.

Generally complex elaborators are not built out of individual term elaborators. There's a single main entry point that sets up its own elaboration strategy with its own intermediate data structures.

Youngju Song (Jan 29 2025 at 20:54):

@Henrik Böving I don't see how "quoting dictionary-like structure to Expr" has to mean serializing/deserializing over and over again. If you are talking about my specific example above, it doesn't: there is a function elab_aux (...): TermElabM (AST × Dict Ident Ty) that does the recursion and I call ToExpr only once in the main entry point elab := (elab_aux ...) <&> toExpr.
@Kyle Miller For "internal" datastructures, yes, sure. I completely agree that it never needs to quote an "internal" data structure (that gets used only inside a term elaborator) to Expr. Just to be clear: I am not doing that. Those "internal" datastructures are set up in the "single main entry point" in my case as well. It is just that the dictionary here is genuinely not an "internal" data structure. And (from the conversation above) since it seemed like quoting a dictionary is considered evil regardless of its nature, I was curious to know why.

Kyle Miller (Jan 30 2025 at 17:17):

It's not that quoting a dictionary is an antipattern — I don't see any problem with that, provided the dictionary data is not used in the current stage. The antipattern is reinterpreting the dictionary data from the Expr.

Last updated: May 02 2025 at 03:31 UTC