Zulip Chat Archive

Stream: lean4

Topic: Uppercase accented characters


Damiano Testa (Sep 19 2025 at 13:34):

Is there a function that returns true on uppercase characters, whether they are accented or not and returns no otherwise?

What I would like is for Ó, Á, Ö, Ê,... to be considered uppercase along with everything that docs#Char.isUpper considers uppercase.

Julia Scheaffer (Sep 19 2025 at 13:51):

Would you want to consider capital letters from non-Latin scripts as well? (Such as Α - capital alpha) If you do, the proper way to do it would be using the Unicode Categories and checking if your character is in the Lu (uppercase letter) category. I don't know if someone has written code in lean to process Unicode categories, but it seems like lean itself is not aware of any properties about BMP or above characters.

If you're not trying to be as complete as possible, you might just want to construct a set of characters you would consider to be upper case.

Julia Scheaffer (Sep 19 2025 at 13:57):

Unicode's uppercase letter category only has 1791 code points in it, a reasonable amount compared to some of the other Unicode categories.

Henrik Böving (Sep 19 2025 at 14:09):

You can use https://github.com/fgdorais/lean4-unicode-basic for now

Damiano Testa (Sep 19 2025 at 14:48):

Thank you both!

The reason for wanting this is to identify whether the name of a Mathlib author is capitalized or not. This is for an extension of the linter that checks that the copyright is well-formed.

So, it is not too important that the check is completely thorough, and I can always add exceptions.

However, I like the suggestion of checking that the character is not lowercase, as I think that currently that is enough for mathlib.

François G. Dorais (Sep 19 2025 at 21:45):

Julia Scheaffer said:

Unicode's uppercase letter category only has 1791 code points in it, a reasonable amount compared to some of the other Unicode categories.

But the Uppercase property also includes another 120 code points (still pretty small).

@Damiano Testa Note that UnicodeBasic depends on nothing and will remain that way. So it's pretty harmless as an additional requirement.

Julian Berman (Sep 20 2025 at 17:55):

Damiano Testa said:

The reason for wanting this is to identify whether the name of a Mathlib author is capitalized or not. This is for an extension of the linter that checks that the copyright is well-formed.

So, it is not too important that the check is completely thorough, and I can always add exceptions.

(Just as an opportunity to link a nice post even though you've already mentioned exceptions :) -- https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/

Number 17 would like a word.

Ruben Van de Velde (Sep 20 2025 at 18:03):

Because of the entire list, I'd recommend against linting people's names

Julia Scheaffer (Sep 20 2025 at 18:15):

For use as a heuristic on when names might be wrong, it doesn't seem like a bad idea. Lints are not taken as ground truth and can be easily overridden I believe.

Damiano Testa (Sep 20 2025 at 20:20):

I think that the consensus was to reduce drastically the check, so, at least for now, worrying about upper/lowercase is not an issue anymore!

Damiano Testa (Sep 20 2025 at 20:21):

Julian, the link that you posted was already suggested and would have appeared in the linter warning! :slight_smile:


Last updated: Dec 20 2025 at 21:32 UTC