Zulip Chat Archive

Stream: general

Topic: unicode

Kenny Lau (Mar 31 2018 at 05:05):

Does every symbol have a non-unicode equivalent?

Mario Carneiro (Mar 31 2018 at 05:09):

No. Most do, but the strict policy on this was dropped a while ago and new features have crept in with no non-unicode equivalents. Basically no one uses lean without unicode anymore (if they ever did), so it wasn't felt necessary to continue to support.

Mario Carneiro (Mar 31 2018 at 05:11):

Here's one:

theorem «a strange-identifier!» : true := trivial

the double french quotes escape everything, and have no non-unicode equivalent

Johan Commelin (Jun 15 2019 at 05:05):

I thought it might be a good idea to use (p breve) as a variable. But Lean says "unexpected token". I thought Lean has full unicode support. What is going on?

Bryan Gin-ge Chen (Jun 15 2019 at 05:32):

From some brief googling it looks like the accent mark is created by a "modifier" unicode character acting on p and so I guess that modifier character isn't allowed by Lean. @Edward Ayers has a page on Unicode in Lean and in the section on accents he warns "It's an absolute minefield." :shrug:

Johan Commelin (Jun 15 2019 at 05:36):

Ok, thanks for the info!

Floris van Doorn (Jun 15 2019 at 08:30):

They cannot be used in identifiers, but they can be used as notation. For example in Flypitch: https://github.com/flypitch/flypitch/blob/64d2bc25e033fb294c7e8d1c3dfba3d26f60b4a8/src/bvm.lean#L1281

postfix `̌ `:90 := check
lemma check_type {α : Type u} {A : α  pSet} : bSet.type ((pSet.mk α A)̌  : bSet 𝔹) = α := rfl
lemma check_type' {x : pSet.{u}} : bSet.type (x̌ : bSet 𝔹) = x.type

Note that the is not a variable, but the check-operation applied to x.

Floris van Doorn (Jun 15 2019 at 08:31):

I'm not sure why Zulip is displaying once as two separate characters, and once as a x with a check on top of it...

matt rice (Jun 15 2019 at 17:16):

@Edward Ayers here is some more supercripts/subscripts (not all of which i have tried in lean)
greek, caps, and some more latin letters scattered about

Marc Huisinga (Jun 15 2019 at 17:21):

if you're looking for the corresponding lean vscode shortcuts for some of these symbols, here's a list: https://github.com/leanprover/vscode-lean/blob/master/translations.json

Edward Ayers (Jun 16 2019 at 21:06):

Lean (quite sensibly, imo) doesn't let you use arbitrary unicode characters in identifiers. The rules for what characters Lean allows are in src/util/name.cpp. Which is:
- Any alphanumeric letter (no starting with numbers though)
- lowercase greek except lambda α β γ δ ε ζ η θ ι κ μ ν ξ ο π ρ ς σ τ υ φ χ ψ ω
- uppercase greek except Pi and Sigma Α Β Γ Δ Ε Ζ Η Θ Ι Κ Λ Μ Ν Ξ Ο Ρ Τ Υ Φ Χ Ψ Ω
- coptic ϊϋόύώϏβθΥϓϔφπϗϘϙϚϛϜϝϞϟϠϡϢϣϤϥϦϧϨϩϪϫϬϭϮϯκρςϳΘε϶ϷϸϺϻ
- Polytonic greek extended character set https://www.unicode.org/charts/PDF/U1F00.pdf
- Letter like symbols https://www.unicode.org/charts/PDF/U2100.pdf ℂ°C℄Ɛ℈°Fℊℋℌℍℎℏℐℑℒℓ℔ℕNo℗℘ℙℚℛℜℝ℞℟℠TEL™℣ℤ℥Ω℧ℨ℩KÅℬℭ℮ℯℰℱℲℳℴאבגדℹ℺FAXℼℽℾℿ⅀⅁⅂⅃⅄ⅅⅆⅇⅈⅉ⅊⅋⅌⅍ⅎ⅏
- Some of the mathematical letters in https://www.unicode.org/charts/PDF/U1D400.pdf


AFAIK you can use any unicode character in the notation parser.

Edward Ayers (Jun 16 2019 at 21:16):

In addition, if it's not at the head of the identifier you can use numbers and ⁿ₀₁₂₃₄₅₆₇₈₉ₐₑₒₓₔₕₖₗₘₙₚₛₜᵢᵣᵤᵥᵦᵧᵨᵩᵪ.

Edward Ayers (Jun 16 2019 at 21:25):

The basic problems with unicode accents are:
- they are rendered inconsistently depending on font and platform (eg sometimes the accent is hidden behind the character). Often they will be rendered in the wrong place (as evidenced by zulip).
- there are multiple sequences of unicode characters which look the same or very similar on the screen but which Lean will read differently.
- It can be finicky to get the editor to input diacritical marks correctly, for example vscode treats as two characters so you have to click right twice to go to the other side of it which is jarring, especially if you don't know how it was made. Also I notice in zulip editor that you can enter wierd states where the diacritical mark is bound to a space and so is rendered slightly before the character even though it looked fine in vscode ... etc

Last updated: Dec 20 2023 at 11:08 UTC