Zulip Chat Archive

Stream: lean4

Topic: type-safe HTML: syntax vs elaboration?

Yuri de Wit (Jun 10 2022 at 18:10):

I have been thinking about the possibilities for type-safe HTML templates in Lean4 (or markup in general). Here, by type safe, I mean more than valid and safe HTML markup (btw, I have seen existing work on this by @Sebastian Ullrich and @Henrik Böving, which I am using as a starting point - thank you), but also making sure it conforms to, for instance, HTML5 to the extent possible (e.g. <p> only accept text nodes).

In my limited view, I see two general directions:

(1) keep the syntax simple/generic (parse any element/attribute/text/comment/etc) and write enhanced elaborators that can check all things HTML5. This seems to be the current current starting point: a general syntax that can capture XML markup into a general AST

(2) have a concrete syntax for each valid HTML5 element/attribute. E.g. If I wanted to create an Aria enabled template, I would also import an Html5.Aria module with syntax definitions for aria elements/attributes (e.g. aria-autocomplete or aria-checked). This concrete syntax would then produce a type-safe HTML5 AST for further elaboration, linting, etc.

Do you have insights into pros and cons between (1) and (2) considering that the end goal here would be to have all the IDE features one would expect nowadays (code completion, formatting, linting, etc) coming, possibly, directly from the lean server?

Xubai Wang (Jun 10 2022 at 18:30):

Do you mean kinds of content? Won't type classes just suffice? For the attributes, actually there are hardly any constraints on html attributes (think of aria-, data- and custom elements).

Mario Carneiro (Jun 10 2022 at 18:37):

I think (1) is far more likely to scale up to all the complications of a real world format like HTML compared to (2). There are just way too many things to check and not all of them can be conveniently represented in the type of the output, unless it is just a literal transcription of the input

Yuri de Wit (Jun 10 2022 at 18:40):

Xubai Wang said:

Do you mean kinds of content? Won't type classes just suffice? For the attributes, actually there are hardly any constraints on html attributes (think of aria-, data- and custom elements).

Yes, what kinds of content can go where. Types of attribute values (booleans, dates, etc).

Yuri de Wit (Jun 10 2022 at 18:44):

Mario Carneiro said:

I think (1) is far more likely to scale up to all the complications of a real world format like HTML compared to (2). There are just way too many things to check and not all of them can be conveniently represented in the type of the output, unless it is just a literal transcription of the input

Yes, I see that and the fact that validations/linting can can grow more organically over time.

Another aspect here is the fact that even if the HTML is invalid we still want to keep as many IDE features available as possible.

How does (1) or (2) impacts code completion, code formatting in lean server (have zero knowledge about it). At the end of the day the most important thing is that users can see available aria attributes that can be added to an existing element based on current imports, etc.

Last updated: May 02 2025 at 03:31 UTC