Zulip Chat Archive

Stream: lean4

Topic: Percent-encoded URLs mess up docgen redirect


Mario Carneiro (Mar 18 2025 at 01:39):

It's a bit difficult to demonstrate this bug because most browsers automatically do percent-decoding, but the following link does not work if you go to it directly, e.g. via wget: https://leanprover-community.github.io/mathlib4_docs/find/?pattern=CategoryTheory.nerve%E2%82%82Adj#doc . I encountered this issue because links inside PDF documents do not do this decoding process, and so the string is sent unchanged as the find/ query string, which ends up redirecting to https:/​/​leanprover-community.github.io/​mathlib4_docs/​404.html#CategoryTheory.nerve%E2%82%82Adj. This is apparently per spec - PDF does not let you put unicode in URLs at all, so AFAICT this makes it impossible to link to declarations using non-ASCII using the equivalent of docs#CategoryTheory.nerve₂Adj .

Mario Carneiro (Mar 18 2025 at 01:40):

...and of course the linkifier breaks too. Pretend I wrote docs#CategoryTheory.nerve₂Adj above. (But I will leave it unedited because it also seems to make a point, although I'm not sure what the takeaway should be, besides "non-ASCII in identifiers is still non-portable".)

Mario Carneiro (Mar 18 2025 at 01:57):

This seems to only be an issue with find/, if one links directly to the docgen page https://leanprover-community.github.io/mathlib4_docs/Mathlib/AlgebraicTopology/SimplicialSet/NerveAdjunction.html#CategoryTheory.nerve%E2%82%82Adj this works as intended (because the browser is interpreting the escapes). I think the source of the bug is the fact that find/ bypasses the usual browser handling to be more "lean-friendly"

Mario Carneiro (Mar 18 2025 at 02:13):

actually never mind, it was user error (the URL was double percent encoded). find.js is correctly decoding the URL fragment.


Last updated: May 02 2025 at 03:31 UTC