Zulip Chat Archive

Stream: Zulip meta

Topic: docs linkifier fails on Greek letter


Junyan Xu (Jul 19 2022 at 04:20):

Example: docs#Action.ρ_Aut

Edward van de Meent (Dec 11 2024 at 21:44):

it seems this is still an issue: docs#zpow_lt_zpow_iff_right₀ . it seems it stops linkifying at the last non-word-character before a non-ascii character. it isn't just that that part of the text is non-clickable, it also doesn't include that part in the actual link.

Alya Abbott (Dec 18 2024 at 23:12):

https://zulip.com/help/add-a-custom-linkifier#advanced-linkifier-patterns may be relevant.

Bryan Gin-ge Chen (Dec 18 2024 at 23:19):

So the current linkifier we're using for this consists of docs#(?P<decl>[\w_.'-]*[\w_?!'-]) and https://leanprover-community.github.io/mathlib4_docs/find/?pattern={decl}#doc.

I've got my hands full for a bit, does anyone want to take a stab at figuring out the right regex incantation to use?

Edward van de Meent (Dec 18 2024 at 23:37):

that explains it, i guess... the regex is faulty

Edward van de Meent (Dec 18 2024 at 23:50):

perhaps an idea might be to use something along the lines ofdocs#(?P<decl>[^\s]+)(\s|$), i.e. match anything up to the next ascii whitespace or end of line?

Edward van de Meent (Dec 18 2024 at 23:51):

(or something along those lines which also makes sure the matched part can't end in ., like docs#(?P<decl>[^\s]*[^\s.])\.?(\s|$))

Bryan Gin-ge Chen (Dec 19 2024 at 00:49):

testing:
docs#Action.ρ_Aut. (edit, ah, I guess the correct name is docs#Action.ρAut) docs#zpow_lt_zpow_iff_right₀.

Bryan Gin-ge Chen (Dec 19 2024 at 00:54):

Thanks Edward, almost there, I think! You can see that we matched the right parentheses in docs#Action.ρAut), so we'll need a fuller list of separators.

After previewing this message, it also looks like we have to add more punctuation symbols, since the comma is screwing things up above as well.

Bryan Gin-ge Chen (Dec 19 2024 at 00:55):

(I've reverted back to what we had before for the time being, so people don't get too confused.)

Eric Wieser (Dec 19 2024 at 01:09):

In lean3 the regex was (?![λΠΣ])[_a-zA-Zα-ωΑ-Ωϊ-ϻἀ-῾℀-⅏𝒜-𝖟](?:(?![λΠΣ])[_a-zA-Zα-ωΑ-Ωϊ-ϻἀ-῾℀-⅏𝒜-𝖟0-9'ⁿ-₉ₐ-ₜᵢ-ᵪ])*


Last updated: May 02 2025 at 03:31 UTC