Zulip Chat Archive

Stream: general

Topic: A plea from a newbie concerning future Unicode use

mars0i (Sep 08 2024 at 18:13):

Background: This page explains how to use · to create prefix functions from operators, e.g. (· + ·).

Though I see Unicode use in Lean as very valuable to make mathematical proofs easier to read, defining general-purpose operations like this one using Unicode characters that are easily mistaken for other characters seems error-prone and liable to create frustration. Why not use ASCII for general functionality? Or at least Unicode that's not visually ambiguous? I'm not asking for changes now, obviously, but proposing a shift in policy for future versions.

(In the case of (· + ·), I saw the examples, and thought, "Oh, let me try typing LaTeX's "\cdot" (in nvim). That worked! Except that it didn't because that's not the right character, and using \cdot produce errors, of course. So what is the character? That documentation page doesn't tell me. So I have to identify the character as u00b7, but now I still don't now an easy way to type it. This seems like an unnecessary waste of time. It's the kind of thing that can turn off people who might find Lean valuable.)

Mario Carneiro (Sep 08 2024 at 18:14):

if you hover the symbol in vscode it will say how you can type it

Henrik Böving (Sep 08 2024 at 18:15):

https://leanprover.zulipchat.com/#narrow/stream/113489-new-members/topic/Unicode.20abbreviations.20in.20vim.2Fnvim/near/468607392 there's no need to double post, people do see your message in new members as well^^

Edward van de Meent (Sep 08 2024 at 18:15):

fwiw, i believe you can just use . too...

Mario Carneiro (Sep 08 2024 at 18:16):

all I can say regarding your plea is that that ship has sailed, lean no longer has ascii equivalents for many things

Edward van de Meent (Sep 08 2024 at 18:17):

i'd like to note that out of all characters, this replacement (. vs ·) is likely the most useless imo

Patrick Massot (Sep 08 2024 at 19:55):

In nvim the default keybinding to show how to type a character is <Leader>\\. This is what you are meant to use every time you see a unicode character in Lean that you don’t know how to type. In case you see on a web-page, you can use copy-pasting first. This is not perfect but a comprise that allowed hundred of new users to benefit from the readability gain coming from unicode.

Shreyas Srinivas (Sep 08 2024 at 20:18):

I think the solution here is to add a unicode tool tip to the doc website with a very visible option to enable it. In verso based blogs you already have tooltips for most things that vscode has them for

mars0i (Sep 08 2024 at 21:19):

Thanks for the help on the other question, @Henrik Böving. I didn't double-post, though. I wanted help with my immediate issue, which you helpfully provided in the other thread. I also wanted to suggest something for future development. I didn't want to mix them up in one thread, and I thought that the suggestion made more sense here. I should have waited a day to post the suggestion here, though, to avoid people viewing it as a request for help. I could have said in a post a day later that my immediate problem had been solved, and that this was a followup.

However, thanks very much @Patrick Massot for the additional help on how to view the keys to generate a character. Your suggestion and @Henrik Böving's in the other thread are exactly what I needed for that question.

And thank you @Edward van de Meent! That's also what I wanted: a way to accomplish sectioning without Unicode.

mars0i (Sep 08 2024 at 21:41):

Coming back to the specific content of my post here:

@Mario Carneiro, OK. I think it would be absurd to try to create ASCII equivalents for every mathematical symbol that's used in proofs. I wonder though whether there could be a long-run goal of having ASCII equivalents for functions that have general application outside of the general goal of mathematical proofs. Lean already has a lot of that.

And maybe thought could be given to whether the benefits of using a particular symbol outweigh confusion with other symbols in relevant contexts? (One of course should choose a good font for Lean--that's fine. Probably no font is perfect, though.)

I'm not suggesting undoing previous decisions, only considering costs and benefits and ways to mitigate downsides of certain symbols. The FRO and community doesn't have to do that, and maybe it's an option that's already been rejected.

I love @Shreyas Srinivas's suggestion. It's a nice way to organize information about special characters. Maybe a lot of work to implement, though? The tool tips could include ASCII equivalents or links to information about them where that's appropriate. (Links because e.g. with Sigma pairs, we have ASCII Sigma and Sigma.mk, but they're not syntactically equivalent to Unicode names for the same constructions.)

mars0i (Sep 08 2024 at 21:44):

Maybe it would help to explain that my current goal is not proofs. I'm writing biological simulations. There will be some proofs included, I expect, but if I can avoid Unicode (and if the non-unicode symbols have documentation that's easy to find), then I think that people outside of the Lean community will have an easier time seeing what my code is doing.

Eric Wieser (Sep 08 2024 at 21:51):

I think what you might be asking for is "documentation for notation" (or more generally, for syntax), which exists in hovers in vscode already, but is not shown in the webpage produced by doc-gen.

Eric Wieser (Sep 08 2024 at 21:52):

I would be surprised if this is not already on @Henrik Böving's radar, but I would also not be surprised if it is low priority

Henrik Böving (Sep 08 2024 at 21:53):

If someone wants to implement a nice doc-gen display for ParserDescr feel free to go ahead, I'd review it^^

Shreyas Srinivas (Sep 08 2024 at 21:59):

Here the lean manual was quoted and I am surprised it isn't verso-fied

Henrik Böving (Sep 08 2024 at 22:00):

There is work underway to provide verso based documentation.

mars0i (Sep 08 2024 at 22:35):

Eric Wieser said:

I think what you might be asking for is "documentation for notation" (or more generally, for syntax), which exists in hovers in vscode already, but is not shown in the webpage produced by doc-gen.

Yes, documentation for notation on the websites would be wonderful, and now I know how to get the information from nvim. But my suggestions went further.

(Amazing IDE interface in nvim compared to other languages I've used. The Lean VSCode IDE must be at least as good.)

Shreyas Srinivas (Sep 08 2024 at 22:37):

I think as Mario suggests that the use of unicode is something too deeply embedded to change such things now.

Shreyas Srinivas (Sep 08 2024 at 22:41):

We can document it. We can provide tooltips in the editor and docpages. There is the symbol list tabled in the vscode extension's doc viewer and so on. But changing the way unicode symbols are used is much much harder

mars0i (Sep 09 2024 at 01:02):

Thanks @Shreyas Srinivas. If you mean that existing symbols won't be changed, I understand. I wouldn't expect that. Even if I wouldn't have chosen · for any common purpose because of its similarity to ⬝ , I certainly recognize that it would be a mistake to change that now.

So maybe, apart from documentation, my suggestions boil down to:

Consider providing ASCII equivalents for common programming functionality.
Don't arbitrarily use Unicode where a reasonable ASCII alternative is available.
For new operations, maybe it would be good to think about things like visual ambiguity.

It may be, though, that you and @Mario Carneiro are saying that the culture surrounding Lean means that these are not things that anyone will take up. If so, thanks for explaining. That's helpful.

My sense is that Lean is already much better at least on 1 and 2 than Agda, so I thought that these might be already part of the culture, and I was just putting in my two cents for more of that. (When I first wrote, I didn't realize that . was an alternative to · .)

(For illustration of what I mean by 2, in Agda, it's possible to get the first element of a sigma pair using fst--I think this is the Haskell source name--but you need an additional import. The normal function for this purpose is proj₁. But fst is a perfectly good name for this, and if one wants a name that means "projection", why not proj1? Why make people type three characters "\_1" where one would be perfectly good? That's my feeling, anyway.)

Violeta Hernández (Sep 09 2024 at 01:42):

Why not use ASCII for general functionality?

Just as a fun fact, out of the ten most spoken languages, the only one that can be entirely written using ASCII is English itself!

Violeta Hernández (Sep 09 2024 at 01:42):

I think that we, in the year 2024, should be embracing Unicode fully.

Violeta Hernández (Sep 09 2024 at 01:43):

Especially when there's built-in support to type these characters on any keyboard

mars0i (Sep 09 2024 at 02:50):

Thanks @Violeta Hernández for pointing that out. I have sympathy for that view in lots of contexts. I think that if some parts of Unicode became widely used in programming, that would be OK. So I kind of agree with you, but obviously I disagree about something. If anyone's interested, here is my reasoning.

I think that at this point, a computer language that uses a lot of Unicode for regular programming tasks is going to seem unfriendly to many programmers--even those who routinely use non-ASCII characters to type in a natural language. Maybe I'm wrong about that last comment--I don't know. At the very least, for a language with Unicode, every user has to learn new key combinations for new characters--unless they already write in a natural language with the same characters. Even someone whose native language isn't based on a variant of the Latin alphabet will probably have learned to program using an ASCII-based programming language, and much of the Unicode in Lean will be different from the characters in their language. In the long run, maybe a new physical or electronic interface will replace our keyboards, and we won't have to worry about learning new multi-keystroke combinations.

I want to emphasize though that I think using Unicode for the math parts of Lean makes it more friendly to people who are used to reading math. I fully support using Unicode in Lean for mathematical expressions. I think it's great. The cost of learning to type new Unicode characters is outweighed by the value of being able to read math easily. (Maybe it would be ideal if every Unicode math character could be entered using a standard LaTeX expression, though. I mentioned above that I originally tried typing LaTeX\cdot for ·, but that made the wrong character.)

One thing that I feel that doesn't depend on whether and in what ways we should embrace Unicode, is that I don't like making source code depend on characters that look similar--even with carefully chosen fonts--but that are different characters. To me this feels like that is a place where bugs can arise. There is a parallel in the printed math world. No mathematical text would make the meaning of an expression depend on whether it used · vs. ⬝, or ○︎ vs. ⚬︎, unless the font the document was printed with made those characters clearly distinguishable.

Kyle Miller (Sep 09 2024 at 03:04):

I think it would be helpful to ground this discussion in actual examples of confusing unicode @mars0i. I don't believe there are any cases in Lean itself of different similar-looking Unicode codepoints having surprisingly different meanings. The only · is · itself.

Mathlib does use some similar-looking unicode in some circumstances, but I don't believe there is example where there is a subtle difference between the meanings. Swapping one for the other likely won't work, or will clearly lead to errors soon enough. An egregious one is Π (for pi types) vs ∏ (for computing products), but it's hard to imagine this leading to any harm (in fact, (∏ i : ι, card (α i)) = card (Π i : ι, α i), so it's relatively harmless).

Kyle Miller (Sep 09 2024 at 03:04):

In my opinion, the much bigger thing to worry about, true no matter what restrictions you make on allowable subsets of codepoints, is that parsing of syntax can be different from what you expect. You can't solve this by sticking with ASCII. I can't remember an example in mathlib of confusing Unicode codepoints needing to be fixed, but I can think of theorems needing to be fixed because of unexpected parses.

Kyle Miller (Sep 09 2024 at 03:08):

(This is from the perspective of math — I know earlier you were talking about programming. Again, I'd suggest grounding the discussion in concrete examples rather than generalities.)

Daniel Weber (Sep 09 2024 at 03:10):

There's also ∑ and Σ. I agree that it isn't that harmful once you're used to it, but it can be confusing when you first learn it

mars0i (Sep 09 2024 at 03:12):

@Kyle Miller, you're right, I don't have examples of bugs that would arise in Lean. So that's good to know.

I do think that when you're first learning a language, errors can be mysterious, and you might not understand that you've used the wrong character. That happened to me with Agda when I didn't understand that its absolute value bar ∣ was not the ASCII bar |, which Agda uses for a different purpose. I didn't know why I was getting a huge mysterious error message, and nothing would compile. This is kind of a bug, but it's a bug in the learning process or in reading someone else's source code.

mars0i (Sep 09 2024 at 03:14):

(I asked a question here but later decided it was irrelevant--I figured out the answer. I don't see how to delete the post though.)

mars0i (Sep 09 2024 at 03:17):

I'm grounding this in general principles and intuitions based on a lot of experience with other (ASCII) languages. But I accept your point: I don't have any examples of bugs in code due to similarity of characters. That this is so may be the result of thoughtful design by Lean's language and library designers and maintainers, or maybe it's just not the sort of thing that would be an issue no matter what. But character similarity still seems a hurdle for new users and readers of code, I feel. I am often a new user of a language, so maybe this kind of thing matters to me more than others. I think a lot about code maintainability, too, in part because of commercial work I used to do, and in part because when I come back to my own code, sometimes I don't understand what I wrote. :smile: So I think about things that might make code confusing.

mars0i (Sep 09 2024 at 03:18):

I'm not going to change Lean as it is now, and I'm not trying to. So that's not what this is about. I'm not attacking Lean. I'm frustrated about some things because I like it.

Trebor Huang (Sep 09 2024 at 03:55):

As a datapoint: A very frequent confusion for Chinese beginners in any programming language (purely ASCII ones!) is the confusion of parentheses () for the non-ascii ones （）. So the best you can do is to only use one of them and stick with it, and it will not completely resolve the confusion even if it is ascii.

Siddhartha Gadgil (Sep 09 2024 at 04:18):

While I am all for unicode, it took me a long time to realize that \dot should be used in certain places (like focus) and not \cdot. My suggestion is that when hovering on a \cdot the message should include a warning that the user may have meant to use \dot.

I have similar issues with the double french brackets. I have to remember where they are used, search for this and hover to figure out how to type them. Again, I would suggest this information is given on hovering on, for example the single french bracket. Alternatively, there can be some way to see the important unicode symbols in VS codes in a group easily (operators, dots, brackets).

This is refining @Shreyas Srinivas's suggestion.

Kyle Miller (Sep 09 2024 at 04:23):

mars0i said:

Kyle Miller do you mean that because the user puts in the wrong character, and entire expression is parsed differently because the syntaxes of the two characters are different?

No, not individual character changes. In my experience, it's usually issues with precedence not being what you expect it to be. In my ∏ example above, I added some parentheses reflexively because of this — I'm pretty sure it's been fixed since a year or two ago, but at some point the = would have been "under" the product. (I see you deleted the question while I wrote this, but you might think this is interesting anyway.)

I'm grounding this in general principles and intuitions based on a lot of experience with other (ASCII) languages

The FRO is interested in feedback, but it's receptive to "here is a concrete example that causes concrete problems". I don't want to speak on behalf of the FRO here (disclosure: I work for it part time), but I'd say it's a long-standing principle that Lean embraces Unicode. Notice, though, that it uses it in a much lighter way than a language like Agda. I do understand that not knowing how to type certain characters is an impediment to learning, but I do think that overall Lean's use of non-ASCII is net positive.

I'd also like to mention that ASCII does not mean "typeable". Many "ASCII" languages are designed on a US keyboard, so use only characters that are available on that keyboard. Our French and German colleagues need to learn how to type ~ for example. I suppose as a trade, the escape character for names in Lean are guillemets («foo»).

That happened to me with Agda when I didn't understand that its absolute value bar ∣ was not the ASCII bar |

Mathlib uses plain vertical bar for absolute value, but it comes with its own challenges, since it requires it to be |x| with no whitespace to be unambiguous. It uses a different character for divisibility a ∣ b, since | can't reliably work here. It's not all sunshine, but you get used to it. The parse errors from mistakes in using | for divisibility are much worse.

Kyle Miller (Sep 09 2024 at 04:24):

@Siddhartha Gadgil A Mathematica-style character palette for VS Code would be interesting. Bonus if it could be dynamic based on imported syntax. A difficulty I face is when there's a concept I know, but the only way to figure out what to type is to find a relevant bit of Mathlib that uses the symbol and then hover over it to find the way to type it.

Siddhartha Gadgil (Sep 09 2024 at 04:29):

Kyle Miller said:

Siddhartha Gadgil A Mathematica-style character palette for VS Code would be interesting. Bonus if it could be dynamic based on imported syntax. A difficulty I face is when there's a concept I know, but the only way to figure out what to type is to find a relevant bit of Mathlib that uses the symbol and then hover over it to find the way to type it.

Exactly. For me (and I suppose beginers) it begins with even notation for focus (the \dot) and escaping (the double french bracket).

Siddhartha Gadgil (Sep 09 2024 at 04:30):

Maybe this can be shown (optionally) in the infoview with a widget.

Kyle Miller (Sep 09 2024 at 04:32):

(\. works for the focus dot by the way. Something I only learned recently, even though it's right in the hover, is that \f<<>> gives you the closing one at the same time.)

Edward van de Meent (Sep 09 2024 at 05:41):

Kyle Miller said:

(\. works for the focus dot by the way. Something I only learned recently, even though it's right in the hover, is that \f<<>> gives you the closing one at the same time.)

I believe this trick works for most (if not all) bracket styles...

Edward van de Meent (Sep 09 2024 at 05:42):

i.e. things like \[] and \<> give both left and right versions

mars0i (Sep 09 2024 at 05:57):

Appreciating all of the comments. @Kyle Miller thanks for all of the details, especially the comparison with Agda. That accords with my experience, but I wasn't sure given my lack of experience with Lean. (Not that I have a lot of experience with Agda.)

Marc Huisinga (Sep 09 2024 at 07:10):

Kyle Miller said:

Something I only learned recently, even though it's right in the hover

It has only been in the hover for a couple of months :-)

David Thrane Christiansen (Sep 09 2024 at 07:34):

Shreyas Srinivas said:

Here the lean manual was quoted and I am surprised it isn't verso-fied

I'm in the process of writing a new manual in Verso - stay tuned!

mars0i (Sep 09 2024 at 18:04):

I wonder whether these remarks in another thread that I started at the same time might provide a case in which confusion between similar characters is possible. I don't know, and I'm not trying to argue that there is such a case.

(I tried to keep the topics of the two threads separate, but helpful remarks in both turn out to relevant to each, it seems.)

mars0i (Sep 09 2024 at 18:32):

It seems as if there are a number of characters used in Lean that are similar. Maybe in practice it's easy to detect mistakes using them, and maybe in practice people reading code won't usually be confused. I accept that Lean is going to use all of these characters, in any event. I wonder whether the following might be useful for Lean, or for Unicode use in general in some other contexts.

In many fonts, zero is clearly distinguished from capital O, and some fonts clearly distinguish a small L from the numeral 1 (not very many!). Some pieces of software go further and render numbers and letters in different colors. This is very helpful in password managers.

I wonder whether it might be useful to displaying similar Unicode characters in different colors. This might be a pain to implement, and it would interfere with other syntax coloring. Or maybe display similar characters in bold vs. normal, or italic vs. regular? I don't know. Just something that occurred to me. No idea whether it would be worth the trouble.

Bhavik Mehta (Sep 10 2024 at 08:41):

Daniel Weber said:

There's also ∑ and Σ. I agree that it isn't that harmful once you're used to it, but it can be confusing when you first learn it

Similarly ∣ and | are also confusing to beginners

Shreyas Srinivas (Sep 10 2024 at 09:10):

The reason people aren't easily confused is that the wrong operator denotes the wrong construct which usually results in a warning or error. In the worst case, one control clicks the operator and is taken to the definition of that notation.

That being said, I wish there was a bit of bi-directional documentation. Hovering over a piece of notation should tell you what function is being called and there ought to be a "did you mean?" Tooltip on infoview message where one can find similar lookimg alternatives. Then clicking on them or applying a quick fix.

Finally, I find the suggestion to have ASCII alternatives for many or all unicode characters to be a dead end due to the effort involved and the fact that the symbol space of useful symbols quickly gets unusable without priority manipulation

Jon Eugster (Sep 10 2024 at 09:16):

On a tangential note, sometimes it's also worth to revisit similar characters and see if they are still needed (for technical reasons) or if one could use the same unicode character.

For example in #6475 I've once experimented with using only one of the two dots ∙ and • (Submodule.span and scalar multiplication). I've never really polished it, but the test looked like it could work in practice. Some of these almost identical characters might come from past Lean parsing problems which there may or may not be a valid workaround.

Eric Wieser (Sep 10 2024 at 09:33):

Perhaps worth noting that this just trades one problem for another. Having every unicode character be visually unambiguous isn't terribly helpful if the underlying notation is resolved more ambiguously to make up for it.

Eric Wieser (Sep 10 2024 at 09:33):

That is: the thing you really care about when reading lean is not "what unicode character is this" but "what does this notation mean"

Shreyas Srinivas (Sep 10 2024 at 10:50):

I propose the following : an attribute for notation/notation3/syntax that links it to a definition. Then when docstrings are compiled, the notation as well as the definition gets a docstring that corresponds to the def + a note about that notation and how to type and what other similar symbols are like. Ofc this docstring shouldn't override any syntax or notation's existing docstring. This way tactic docstrings remain intact.

This way the tool tip for notation and the docstrings are integrated where possible

Jon Eugster (Sep 10 2024 at 12:58):

@Shreyas Srinivas is this roughly doing what you would want?

import Mathlib

/-- Docstring of `something`. -/
def something (n : Nat) := 42 + n

#check something
-- hover text:
--   Docstring of something.

@[inherit_doc something]
notation (name := myNotation) x "⬝⬝" => something x

extend_docs something after "`n ⬝⬝` is a notation for this."

#check something
-- hover text:
--   Docstring of something.
--   n ⬝⬝ is a notation for this.

#check 50 ⬝⬝
-- hover text:
--   Docstring of something.
--   Type ⬝ using \tr or \dot or \con or \cdot

Shreyas Srinivas (Sep 10 2024 at 13:00):

Jon Eugster said:

Shreyas Srinivas is this roughly doing what you would want?

import Mathlib

/-- Docstring of `something`. -/
def something (n : Nat) := 42 + n

#check something
-- hover text:
--   Docstring of something.

@[inherit_doc something]
notation (name := myNotation) x "⬝⬝" => something x

extend_docs something after "`n ⬝⬝` is a notation for this."

#check something
-- hover text:
--   Docstring of something.
--   n ⬝⬝ is a notation for this.

#check 50 ⬝⬝
-- hover text:
--   Docstring of something.
--   Type ⬝ using \tr or \dot or \con or \cdot

Close enough. I would just add a couple of other things. The ascii shortcut for the unicode notation. Having two different tooltips is confusing and trying to trigger one over the other requires some acrobatics.

Shreyas Srinivas (Sep 10 2024 at 13:01):

and a link to similar notation

Eric Wieser (Sep 10 2024 at 13:01):

I think including the shortcut in the docstring is a mistake, because then it can be wrong.

Shreyas Srinivas (Sep 10 2024 at 13:01):

Actually maybe the second one is too much to ask since someone has to compile all these similar notations

Shreyas Srinivas (Sep 10 2024 at 13:01):

So just the first one

Jon Eugster (Sep 10 2024 at 13:02):

Shreyas Srinivas said:

and a link to similar notation

but that certainly has to be written (and kept updated) manually, right?

Shreyas Srinivas (Sep 10 2024 at 13:03):

Yeah, let's scratch that similar notation part for now. That's more complex.

Shreyas Srinivas (Sep 10 2024 at 13:04):

Ideally if there is a warning or error on notation, a "did you mean the following: <list> <item> <notation> - definition</item> <item>.....</list> would be better

Shreyas Srinivas (Sep 10 2024 at 13:04):

But that can be separate from this inherit docs thing

Jon Eugster (Sep 10 2024 at 13:07):

Eric Wieser said:

I think including the shortcut in the docstring is a mistake, because then it can be wrong.

I'm not sure to which message this is an answer to, so just to be explicit: The "Type ⬝ using \tr or \dot or \con or \cdot" above is the auto-generated text the Lean extension creates when hovering over a unicode character. And it's indeed a bad idea to write this manually anywhere, and probably there is also no need to.

Shreyas Srinivas (Sep 10 2024 at 13:10):

I am just suggesting to automatically fuse it with the notation's docstring which is made identical to the docstring of the declaration

Shreyas Srinivas (Sep 10 2024 at 13:11):

so if someone hovers the mouse over notation, they get both things: the docstring and the ascii shortcut text.

Shreyas Srinivas (Sep 10 2024 at 17:26):

The reason I am suggesting this is that currently the simplest way to trace back what a notation denotes is to ctrl + click it and then hover the mouse over the declaration to get its docstring. Two steps where one ought to suffice.

Bhavik Mehta (Sep 10 2024 at 18:27):

Shreyas Srinivas said:

The reason people aren't easily confused is that the wrong operator denotes the wrong construct which usually results in a warning or error. In the worst case, one control clicks the operator and is taken to the definition of that notation.

This is a cute argument but unfortunately it's empirically not true in practice - this example is the notational problem I've seen confuse beginners more than any other!

Shreyas Srinivas (Sep 10 2024 at 18:30):

I complained about this exact thing in early late 2022 or early 2023, and I was told that lean is not meant to be used with a text editor without extensive support like the extension. I have learnt since then that this is true.

mars0i (Sep 10 2024 at 19:27):

Jon Eugster said:

Shreyas Srinivas said:

and a link to similar notation

but that certainly has to be written (and kept updated) manually, right?

What about a crowdsourced list of similar characters on github or somewhere else public, that could be pulled from. I don't know enough about docstring, etc. generation to know whether that makes sense.

mars0i (Sep 10 2024 at 19:40):

Shreyas Srinivas said:

I complained about this exact thing in early late 2022 or early 2023, and I was told that lean is not meant to be used with a text editor without extensive support like the extension. I have learnt since then that this is true.

I am learning that this is the right way to use Lean, and I accept it, and am not trying to change it. (I have learned from this thread.) But ... note that this sounds like: notation in source code is sometimes is not sufficiently clear to be read on its own, so one ought to read it with the help of software that adds additional notation on demand. That's not so different from the request that source code (in any language) include comments, though maybe there's an additional hurdle in the case of Lean notation. I assume the tradeoff is well worth it. (I'm thinking here about reading and understanding source code rather than writing it. Writing is different because then you're already interacting with Lean--one of the great things about it--and you're likely to get feedback if you do the wrong thing.)

R Dong (Dec 06 2024 at 15:22):

It's much more challenging to try to learn Lean from a book due to the use of unicode, because there's not really a way to "reverse lookup" a unicode symbol. In fact, many unicode symbols shown in PDF are composed of multiple glyphs, and won't even copy correctly.

Shanghe Chen (Dec 06 2024 at 15:47):

Yeah recognizing and typing unicode letters is challenging. For the first purpose some tips like copying the glyphs and then using language like python to get the code point of it may be helpful like hex(ord('∀')) results in '0x2200' in python. In this way it may help distinguishing different ambiguous unicodes. For typing unicode I am considering to start writing an input method schema dedicated to Lean using rime, which is an input method mainly used for typing Chinese characters that contains thousands of different unicodes. The situation maybe similar but I don’t know if input method is globally used or just in some regions.

R Dong (Dec 06 2024 at 16:27):

As I mentioned, copying unicode doesn't always work. This, for instance:
image.png, copies to a right arrow.

Input methods are language specific. Most Chinese IM, for instance, won't have a way to typeset all these symbols. If I can't find a character in vscode lean extension or the latex input plugin, I am kind of screwed.

Michael Stoll (Dec 06 2024 at 16:30):

In the ∀ menu in VSCode, click on "Documentation..." and then on the last entry. This gives you a list of all Unicode abbreviations. (E.g., ↦ is available as "\mapsto" or a unique prefix of that like "\map").

R Dong (Dec 06 2024 at 16:34):

I didn't find it the first time I read it, because I was doing a reverse lookup from the unicode symbol I see in the Hitchhiker's guide.

Julian Berman (Dec 06 2024 at 16:42):

If I can't find a character in vscode lean extension or the latex input plugin, I am kind of screwed.

Can you say a bit more here on what you mean? All the characters you see used in Lean (really Mathlib) should be typeable via the VSCode Lean extension -- every so often this doesn't turn out to be the case but it's not common. Was there a specific example you couldn't type?

R Dong (Dec 06 2024 at 16:44):

Yeah, all of them can be typed, but sometimes it's difficult to find because printed work and PDFs can use a different way (or typeface?) to represent the same symbol. Like the |-> symbol above.

Michael Stoll (Dec 06 2024 at 16:45):

For most symbols, the usual LaTeX macro works (like in the example above).

Julian Berman (Dec 06 2024 at 16:45):

R Dong said:

Yeah, all of them can be typed, but sometimes it's difficult to find because printed work and PDFs can use a different way (or typeface?) to represent the same symbol. Like the |-> symbol above.

Got it, OK, I think "Lean is hard to learn or read via static reading mediums" is definitely reasonable sounding (though I think the general long term solution is 'mostly have dynamic reading mediums').

R Dong (Dec 06 2024 at 16:47):

That's not really possible. Books will exist.

R Dong (Dec 06 2024 at 16:48):

In fact, since Lean is so heavily academics, I would assume many would publish papers. Papers are usually typeset differently as well.

Julian Berman (Dec 06 2024 at 16:49):

Well it's certainly possible, I didn't say they wouldn't exist -- I think the general feeling is that Lean has to be used with an infoview. Not knowing how to type symbols is not as big of an issue as not being able to see the state of the world at all.

Shreyas Srinivas (Dec 06 2024 at 16:50):

As Mario explained before, that ship has sailed

Shreyas Srinivas (Dec 06 2024 at 17:08):

But arguably this issue of PDF math also applies to LaTeX, so it is not unique to lean

Patrick Massot (Dec 06 2024 at 17:28):

In the near future Verso will finish making this conversation completely pointless since it brings so much that a pdf or printed web page will never give.

mars0i (Dec 06 2024 at 18:20):

PDFs are valuable electronic documents, for me, because they're easy to annotate with tools that I already know, and a PDF makes it easy to have an entire book available when I have to be offline.

It's important to me to be able to annotate in the way that I want--highlighting with colors and underlines, adding notes, arrows, etc. This is how I read/think/learn.

There are tools available for annotating web pages, and I've started to investigate them--because of Lean's documentation--but they are new and have drawbacks, whereas PDF is an old technology with tools that I trust.

(Will Verso will allow annotation by readers? I didn't think that's one of the goals. If it will, that's great! But that's a lot to ask for; Verso already has a high standard it's trying to meet.)

fwiw @David Thrane Christiansen said in another thread a while ago that producing nice PDFs was one goal for Verso. The current PDFs are not a great substitute for the online version, because there's no navigation--not even a text table of contents--but that will change, I'm sure.

mars0i (Dec 06 2024 at 18:30):

About printed books, this is another "user interface" that I like. Make it very easy to flip back and forth between different pages, and to sit and read in a thoughtful way, maybe with the help of a pad of paper. Also: annotation is easy, and allows all possible mathematical notations, arrows, etc., using a pencil. :smile:

If a textbook is designed to be printed, then the author has a responsibility to provide tips about how to type less-common characters. The Agda language also uses a lot of Unicode, and there are printed textbooks, but they usually provide character-typing tips. One still needs to be able to look up input strings for other characters that you find online, but editor extensions let you do that.

mars0i (Dec 06 2024 at 18:33):

(I started this thread a while ago, and though I still have a few concerns concerning Lean's use of Unicode, I don't object in the way that I did earlier.)

Trebor Huang (Dec 06 2024 at 18:37):

Agda tutorials all have little sections that summarize the unicode symbols used in each chapter, which are intended for printed medium. I would argue learning any programming language from a printed medium is a mistake, but ymmv.

Trebor Huang (Dec 06 2024 at 18:38):

Ah it's already mentioned above

Mario Carneiro (Dec 06 2024 at 18:41):

are the sections human written? Do you have an example?

mars0i (Dec 06 2024 at 18:42):

Some of my original negativity about Unicode in Lean was based on experience with Agda. So far, my experience with Lean is that its use of Unicode is not as bothersome as Agda, but the difference has to do with subtleties that would require a bit of thought to spell out. Probably not worth the effort.

mars0i (Dec 06 2024 at 18:43):

@Mario Carneiro: Human written. Here's one: https://plfa.github.io/Naturals/#unicode

Julian Berman (Dec 06 2024 at 18:46):

That does look nice -- but also seemingly easy to autogenerate (presumably by Verso) by just reflecting on the characters in the current document? Even if they had to occasionally have some actual hand-written addendum explaining a specific confusion (like disambiguating a portion of the category theory library was using one arrow glyph or another or something)

mars0i (Dec 06 2024 at 18:46):

Or look at PDF page 62 in the sample version of Maguire's Certainty By Construction.

mars0i (Dec 06 2024 at 18:47):

Stump's Verified Functional Programming in Agda has this kind of thing as well.

R Dong (Dec 06 2024 at 19:09):

I haven't used Agda.

As I say, it's not possible to get around PDF and printed material. People will publish to journals. No amount of electronic documentation or dynamic content will replace them, because printed material is archival, and dynamic document is not. I'm not talking just about learning material for the language, but research papers.

The difficulty I encountered today was that I went straight into a chapter in the middle, encountered a notation, then I couldn't find if such a table exist in any chapter, because single Unicode symbols are not searchable in the PDF, in search engine, in the documentation site, or in browser Ctrl-F. However, if there's a better way to reverse-lookup Unicode symbols, the issue I have may be alleviated. Perhaps a textbook should also include a lookup table connecting the typeface used in text, the corresponding Unicode symbol, and how to type it. Research paper that introduces new notation would also probably have this.

Theory-related tools seem to really like Unicode, which as a programmer I am very not used to seeing. In most general purpose languages, even if Unicode identifiers are supported, they are not recommended.

David Thrane Christiansen (Dec 06 2024 at 19:49):

Good PDF and EPUB support is a goal of Verso, but not a short-term one. For now, Verso is being developed precisely in the ways that support the new language reference.

I personally use a print stylesheet and my Remarkable for proofreading, but it's not what I'd call a "good PDF". Only Chrome seems capable of even this - Firefox doesn't rewrite internal links to PDF page references, which makes the ToC not so useful because CSS has no way of generating page number references, and Chromium crashes when I try to print anything over a couple hundred pages to PDF.

Doing better would be great, but it's lower priority than actually writing docs at the moment. There are lots of nice things we can do to make the experience of reading a PDF better for Lean, like automatically generating a table of notation used, and these things will also benefit online readers. It's just a matter of priorities at this point.

David Thrane Christiansen (Dec 06 2024 at 19:50):

I also like learning programming languages from printed books - a good book in the woods with a thermos of tea, bracketed by a bike ride, is a great way to prepare for a focused session of experimentation! I wouldn't want a printed reference work, but didactic materials can be very good in that format.

mars0i (Dec 06 2024 at 21:02):

R Dong said:

The difficulty I encountered today was that I went straight into a chapter in the middle, encountered a notation, then I couldn't find if such a table exist in any chapter, because single Unicode symbols are not searchable in the PDF, in search engine, in the documentation site, or in browser Ctrl-F.

@David Thrane Christiansen has indicated in the past that improving the Verso search is one goal. It's definitely frustrating sometimes. Here is a recent comment: https://leanprover.zulipchat.com/#narrow/channel/113488-general/topic/Discussion.20thread.20for.20Lean.20Language.20Reference/near/478105522 Sounds like you're working from a PDF, though.

I've had success with searching for Unicode characters using Ctrl-F in MacOS Firefox in Verso documents such as FPIL, but that only works if you already are already on the relevant page.

Copying code into an editor from the browser and then using a lookup function there is how I find out how to type a character. Not ideal but it works. I haven't tried that from a PDF, though. At present, I treat PDFs of Verso as backup plan. A browser is better.

mars0i (Dec 06 2024 at 21:17):

I'm able to successfully search for some Unicode in a Verso-derived PDF using Skim in MacOS, MacOS Acrobat Reader, and iAnnotate in iOS. Maybe it only works with some characters?

Winston Yin (尹維晨) (Dec 06 2024 at 23:17):

This is not the most useful format, but has helped me find a number of Unicode characters I needed in mathlib: (see Kyle's link below)

Kyle Miller (Dec 06 2024 at 23:27):

@Winston Yin (尹維晨) That's accessible from within VS Code:
Michael Stoll said:

In the ∀ menu in VSCode, click on "Documentation..." and then on the last entry. This gives you a list of all Unicode abbreviations. (E.g., ↦ is available as "\mapsto" or a unique prefix of that like "\map").

It's also in the command menu under > Lean 4: Docs: Show unicode input abbreviations. For me it shows up with just >unicode.

Kyle Miller (Dec 06 2024 at 23:28):

Big warning, your link is for the Lean 3 extension. I think this is the one for Lean 4: https://github.com/leanprover/vscode-lean4/blob/master/lean4-unicode-input/src/abbreviations.json

Eric Wieser (Dec 07 2024 at 02:53):

R Dong said:

PDFs can use a different way (or typeface?) to represent the same symbol.

If a PDF is not encoding its Lean code as copyable Unicode, then this is arguably the fault of the pdf author or publisher

Eric Wieser (Dec 07 2024 at 03:02):

(it is very unfortunate that even arXiv does everything it can to force the author to do this wrong)

Yury G. Kudryashov (Dec 07 2024 at 03:10):

If there is a LaTeX setup that works with arXiv and produces correct copyable Unicode, then I think that we should promote it somewhere on the website.

Eric Wieser (Dec 07 2024 at 04:02):

I know of one, but it involves defeating a mechanism that arXiv has in place to prevent uploading PDFs without pdfLaTeX source

Eric Wieser (Dec 07 2024 at 04:03):

And so the act of promoting it might result in it not working any more

Yury G. Kudryashov (Dec 07 2024 at 04:05):

So, their version of pdflatex can't produce correct PDF files?

Eric Wieser (Dec 07 2024 at 04:06):

I think pdfLaTeX doesn't produce Unicode text layers at all, but maybe I'm misremembering and you can make it do so with great pain using accsupp (edit: this only works in very few PDF readers)

Last updated: May 02 2025 at 03:31 UTC