## Stream: general

### Topic: unicode

#### Kenny Lau (Mar 31 2018 at 05:05):

Does every symbol have a non-unicode equivalent?

#### Mario Carneiro (Mar 31 2018 at 05:09):

No. Most do, but the strict policy on this was dropped a while ago and new features have crept in with no non-unicode equivalents. Basically no one uses lean without unicode anymore (if they ever did), so it wasn't felt necessary to continue to support.

#### Mario Carneiro (Mar 31 2018 at 05:11):

Here's one:

theorem «a strange-identifier!» : true := trivial


the double french quotes escape everything, and have no non-unicode equivalent

#### Johan Commelin (Jun 15 2019 at 05:05):

I thought it might be a good idea to use p̌ (p breve) as a variable. But Lean says "unexpected token". I thought Lean has full unicode support. What is going on?

#### Bryan Gin-ge Chen (Jun 15 2019 at 05:32):

From some brief googling it looks like the accent mark is created by a "modifier" unicode character acting on p and so I guess that modifier character isn't allowed by Lean. @Edward Ayers has a page on Unicode in Lean and in the section on accents he warns "It's an absolute minefield." :shrug:

#### Johan Commelin (Jun 15 2019 at 05:36):

Ok, thanks for the info!

#### Floris van Doorn (Jun 15 2019 at 08:30):

They cannot be used in identifiers, but they can be used as notation. For example in Flypitch: https://github.com/flypitch/flypitch/blob/64d2bc25e033fb294c7e8d1c3dfba3d26f60b4a8/src/bvm.lean#L1281

postfix ̌ :90 := check
lemma check_type {α : Type u} {A : α → pSet} : bSet.type ((pSet.mk α A)̌  : bSet 𝔹) = α := rfl
lemma check_type' {x : pSet.{u}} : bSet.type (x̌ : bSet 𝔹) = x.type


Note that the x̌ is not a variable, but the check-operation applied to x.

#### Floris van Doorn (Jun 15 2019 at 08:31):

I'm not sure why Zulip is displaying x̌ once as two separate characters, and once as a x with a check on top of it...

#### matt rice (Jun 15 2019 at 17:16):

@Edward Ayers here is some more supercripts/subscripts (not all of which i have tried in lean)
greek, caps, and some more latin letters scattered about
https://github.com/ratmice/lumpy-leandoc/blob/master/src/tex_gen.rs#L366-L373

#### Marc Huisinga (Jun 15 2019 at 17:21):

if you're looking for the corresponding lean vscode shortcuts for some of these symbols, here's a list: https://github.com/leanprover/vscode-lean/blob/master/translations.json

#### Edward Ayers (Jun 16 2019 at 21:06):

Lean (quite sensibly, imo) doesn't let you use arbitrary unicode characters in identifiers. The rules for what characters Lean allows are in src/util/name.cpp. Which is:
- Any alphanumeric letter (no starting with numbers though)
- lowercase greek except lambda α β γ δ ε ζ η θ ι κ μ ν ξ ο π ρ ς σ τ υ φ χ ψ ω
- uppercase greek except Pi and Sigma Α Β Γ Δ Ε Ζ Η Θ Ι Κ Λ Μ Ν Ξ Ο Ρ Τ Υ Φ Χ Ψ Ω
- coptic ϊϋόύώϏβθΥϓϔφπϗϘϙϚϛϜϝϞϟϠϡϢϣϤϥϦϧϨϩϪϫϬϭϮϯκρςϳΘε϶ϷϸϺϻ
- Polytonic greek extended character set https://www.unicode.org/charts/PDF/U1F00.pdf
- Letter like symbols https://www.unicode.org/charts/PDF/U2100.pdf ℂ°C℄Ɛ℈°Fℊℋℌℍℎℏℐℑℒℓ℔ℕNo℗℘ℙℚℛℜℝ℞℟℠TEL™℣ℤ℥Ω℧ℨ℩KÅℬℭ℮ℯℰℱℲℳℴאבגדℹ℺FAXℼℽℾℿ⅀⅁⅂⅃⅄ⅅⅆⅇⅈⅉ⅊⅋⅌⅍ⅎ⅏
- Some of the mathematical letters in https://www.unicode.org/charts/PDF/U1D400.pdf

𝒜𝒞𝒟𝒢𝒥𝒦𝒩𝒪𝒫𝒬𝒮𝒯𝒰𝒱𝒲𝒳𝒴𝒵𝒶𝒷𝒸𝒹𝒻𝒽𝒾𝒿𝓀𝓁𝓂𝓃𝓅𝓆𝓇𝓈𝓉𝓊𝓋𝓌𝓍𝓎𝓏𝓐𝓑𝓒𝓓𝓔𝓕𝓖𝓗𝓘𝓙𝓚𝓛𝓜𝓝𝓞𝓟𝓠𝓡𝓢𝓣𝓤𝓥𝓦𝓧𝓨𝓩𝓪𝓫𝓬𝓭𝓮𝓯𝓰𝓱𝓲𝓳𝓴𝓵𝓶𝓷𝓸𝓹𝓺𝓻𝓼𝓽𝓾𝓿𝔀𝔁𝔂𝔃𝔄𝔅𝔇𝔈𝔉𝔊𝔍𝔎𝔏𝔐𝔑𝔒𝔓𝔔𝔖𝔗𝔘𝔙𝔚𝔛𝔜𝔞𝔟𝔠𝔡𝔢𝔣𝔤𝔥𝔦𝔧𝔨𝔩𝔪𝔫𝔬𝔭𝔮𝔯𝔰𝔱𝔲𝔳𝔴𝔵𝔶𝔷𝔸𝔹𝔻𝔼𝔽𝔾𝕀𝕁𝕂𝕃𝕄𝕆𝕊𝕋𝕌𝕍𝕎𝕏𝕐𝕒𝕓𝕔𝕕𝕖𝕗𝕘𝕙𝕚𝕛𝕜𝕝𝕞𝕟𝕠𝕡𝕢𝕣𝕤𝕥𝕦𝕧𝕨𝕩𝕪𝕫𝕬𝕭𝕮𝕯𝕰𝕱𝕲𝕳𝕴𝕵𝕶𝕷𝕸𝕹𝕺𝕻𝕼𝕽𝕾𝕿𝖀𝖁𝖂𝖃𝖄𝖅𝖆𝖇𝖈𝖉𝖊𝖋𝖌𝖍𝖎𝖏𝖐𝖑𝖒𝖓𝖔𝖕𝖖𝖗𝖘𝖙𝖚𝖛𝖜𝖝𝖞𝖟

AFAIK you can use any unicode character in the notation parser.

#### Edward Ayers (Jun 16 2019 at 21:16):

In addition, if it's not at the head of the identifier you can use numbers and ⁿ₀₁₂₃₄₅₆₇₈₉ₐₑₒₓₔₕₖₗₘₙₚₛₜᵢᵣᵤᵥᵦᵧᵨᵩᵪ.

#### Edward Ayers (Jun 16 2019 at 21:25):

The basic problems with unicode accents are:
- they are rendered inconsistently depending on font and platform (eg sometimes the accent is hidden behind the character). Often they will be rendered in the wrong place (as evidenced by zulip).
- there are multiple sequences of unicode characters which look the same or very similar on the screen but which Lean will read differently.
- It can be finicky to get the editor to input diacritical marks correctly, for example vscode treats č as two characters so you have to click right twice to go to the other side of it which is jarring, especially if you don't know how it was made. Also I notice in zulip editor that you can enter wierd states where the diacritical mark is bound to a space and so is rendered slightly before the character even though it looked fine in vscode ... etc

Last updated: May 15 2021 at 23:13 UTC