Zulip Chat Archive

Stream: general

Topic: characters which can or can't be in names


Jason Rute (Jun 02 2020 at 02:21):

I want to find all possible instance of a certain name "foo". Things like x.foo, (foo), a:foo, foo.bar, etc, but not foo_bar or myfoo. Because of unicode, it is hard to determine when a name has ended (or begun). Is there a list of one of these sets of characters or rules?:

  • all characters which can't be in a name (and therefore can be used to determine when a name begins and ends)
  • all characters which can be in a name
  • the exact rules for allowable names
  • the exact rules for parsing names (if not too complicated)

If it matters, I'm more ok with false positives (it could be foo but isn't), than false negatives. I'm not concerned with if there is more than one declaration foo, like nat.foo and int.foo. That is the part I'm working on solving.

Jason Rute (Jun 02 2020 at 03:34):

For now I'm just checking that there isn't an alphanumeric character or _ on either side. This will eliminate cases like foo_bar or myfoo, but not αfoo or foo' that is probably ok for now since I will check all those anyway to make sure they are the exact foo I'm looking for.

Mario Carneiro (Jun 02 2020 at 03:38):

The allowable name characters are given by this C++: https://github.com/leanprover-community/lean/blob/master/src/util/name.cpp#L27-L56

Mario Carneiro (Jun 02 2020 at 03:39):

If you ignore the unicode, it is the regex [A-Za-z_][A-Za-z0-9_']*

Mario Carneiro (Jun 02 2020 at 03:42):

(by the way, I always look for "Coptic" to find this snippet, since of course Coptic characters don't actually come up anywhere else)

Jason Rute (Jun 02 2020 at 03:49):

Thanks!


Last updated: Dec 20 2023 at 11:08 UTC