Zulip Chat Archive

Stream: general

Topic: parsing question about `∥x∥`


Kevin Buzzard (Feb 17 2021 at 23:40):

I am surprised about the following:

import analysis.normed_space.basic

example (α : Type) [has_norm α] [has_add α] (x y : α) :
  x + y  max x y := sorry -- doesn't compile :-(

The error is

type mismatch at application
  ∥x + y∥ ≤ max
term
  max
has type
  ?m_1 → ?m_1 → ?m_1 : Type ?
but is expected to have type
  ℝ : Type
-/

In analysis.normed_space.basic we have this:

class has_norm (α : Type*) := (norm : α  )

export has_norm (norm)

notation `∥`:1024 e:1 `∥`:1 := norm e

but yet if I #print notation \|| I get

`∥`:1 _:1 `∥`:1 := has_norm.norm #0

In fact even the following file doesn't make the binding power go above 1:

def norm : Type := sorry

notation `∥`:1024 e:1 `∥`:1 := norm e

#print notation  -- `∥`:1 _:1 `∥`:1 := norm #0

What have I misunderstood here?

Kevin Buzzard (Feb 17 2021 at 23:42):

def norm : Type := sorry

notation `∥`:512 e:37 `∥`:999 := norm e

#print notation  -- `∥`:999 _:37 `∥`:999 := norm #0

Is this to be expected? I can't have the first and second \|| with different binding power? This is clearly not what the authors of analysis.normed_space.basic were expecting.

Eric Wieser (Feb 17 2021 at 23:46):

I think binding power is global per-token?

Kevin Buzzard (Feb 17 2021 at 23:47):

But ∥x + y∥ ≤ max ∥x∥ ∥y∥ is a set phrase in nonarchimedean geometry! Can we not get it to parse? ∥x + y∥ ≤ (max ∥x∥ ∥y∥) is the current workaround.

Eric Wieser (Feb 17 2021 at 23:50):

How does parsing fail if you use 1024 for both priorities?

Kevin Buzzard (Feb 17 2021 at 23:50):

`⌊`:1024 _:1 `⌋`:1 := floor #0 is the correct choice, so it seems; it's just that we are bitten by both sides being the same token.

Kevin Buzzard (Feb 17 2021 at 23:51):

with 1024 on both I get two invalid expression errors, it's hard to tell what's going wrong.

Kevin Buzzard (Feb 17 2021 at 23:52):

one red squiggle under \le, the other under ||y||

Mario Carneiro (Feb 18 2021 at 00:01):

Yeah, bracketing operators that are the same on both sides are nasty to parse, for example |x+|+f|y|+z|+z|

Mario Carneiro (Feb 18 2021 at 00:03):

I think there are vertical bar tokens in unicode with little left/right ticks on them that are an okay compromise IMO

Kevin Buzzard (Feb 18 2021 at 00:09):

Mario Carneiro said:

Yeah, bracketing operators that are the same on both sides are nasty to parse, for example |x+|+f|y|+z|+z|

You just need good spacing :-)

Kevin Buzzard (Feb 18 2021 at 00:12):

@Patrick Massot @Heather Macbeth @Sebastien Gouezel @Yury G. Kudryashov did you never have problems with parsing and \||? If not then you are presumably not too interested in the idea of replacing it with some variant of lfloor and rfloor (or any other two different tokens). What you have here in analysis.normed_space.basic is being ignored by the parser -- the 1024 is apparently overruled by the 1, which is quite a big difference! I'm surprised this issue didn't show up sooner. Or maybe it did and you know about it...

Eric Wieser (Feb 18 2021 at 00:26):

How about the "new" tokens (∥x + y∥) - I hear parts of mathlib already use those /s

Mario Carneiro (Feb 18 2021 at 00:31):

that also works here, if you write ∥x + y∥ ≤ max (∥x∥) (∥y∥)

Mario Carneiro (Feb 18 2021 at 00:35):

Suggestion: ⸡x⸠

Mario Carneiro (Feb 18 2021 at 00:36):

Note that unicode calls the left and the right, but I think it makes more sense with the quills pointing out rather than in

Eric Wieser (Feb 18 2021 at 00:50):

That's "⸡x⸠" for android users

Yury G. Kudryashov (Feb 18 2021 at 00:52):

You've pasted the same symbol, so we see the same symbol.

Yury G. Kudryashov (Feb 18 2021 at 00:53):

@Kevin Buzzard Sometimes I see parsing errors and add parentheses.

Alex J. Best (Feb 18 2021 at 01:27):

How about a subscript?

import data.real.basic
class has_norm (α : Type*) := (norm : α  )

export has_norm (norm)

notation `∥`:1024 e:1 `∥ₙ`:1 := norm e

example (α : Type) [has_norm α] [has_add α] (x y : α) :
  x + y∥ₙ  max x∥ₙ y∥ₙ := sorry

seems to work fine and looks way more mathematically reasonable than ⸡x⸠

Alex J. Best (Feb 18 2021 at 01:28):

( for norm :smile: )

Mario Carneiro (Feb 18 2021 at 01:32):

What's mathematically unreasonable about it? I think the quills are almost invisible, just barely big enough that you can tell which is which. Just look at them as vertical bars with an orientation

Mario Carneiro (Feb 18 2021 at 01:32):

FYI I got the idea from mathematica, which uses characters that look essentially like this when you use the absolute value notation

Mario Carneiro (Feb 18 2021 at 01:33):

https://reference.wolfram.com/language/ref/character/LeftBracketingBar.html

Mario Carneiro (Feb 18 2021 at 01:34):

Drawn in monospaced fonts with a small left‐pointing tee to indicate direction.

I guess they draw it as a plain vertical bar in "presentation mode"

Alex J. Best (Feb 18 2021 at 01:44):

Mario Carneiro said:

What's mathematically unreasonable about it? I think the quills are almost invisible, just barely big enough that you can tell which is which. Just look at them as vertical bars with an orientation

Well I've never seen that character used in a math paper and if I saw it out of context I would have less idea what it meant than ∥y∥ₙ, and I assume most mathematicians would feel the same. A big strength of lean for me is that I can show mathematicians who don't know lean formalized statements and quite often they can understand them, I think using less "weird" symbols helps with this psychologically.

Alex J. Best (Feb 18 2021 at 01:46):

Alex J. Best said:

( for norm :smile: )

Actually I guess one issue with n is that it might be interpreted as "some natural number"

Mario Carneiro (Feb 18 2021 at 03:03):

I would hope that the context makes it clear:

example (α : Type) [has_norm α] [has_add α] (x y : α) : x + y  max x y := sorry

Frankly I think there are much worse offenders when it comes to the set of unicode characters we use, like all the arrow variations. This is a vertical bar variation, and it's maybe not obvious what the variation means but that's going to be true for any notation - at some point you have to actually look it up or guess based on the names of theorems about it. I don't like a subscript n because then it looks parameterized, like a p-norm or something

Mario Carneiro (Feb 18 2021 at 03:05):

I think being able to visually see the difference between the left and right delimiters is also useful for human consumers. I think the mathematica design is telling: they use the plain bars for presentation but make the left/right thing visible in the monospace font so that authors don't confuse the two

Heather Macbeth (Feb 18 2021 at 03:15):

Personally, I'd rather keep the current notation, and just add parentheses when Lean can't figure it out . . . I really value that Lean code is semi-readable to outsiders, I think any step away from standard math notation should be very carefully weighed.

Heather Macbeth (Feb 18 2021 at 03:18):

@Alex J. Best's suggestion just shows up to me as blank boxes ... maybe I don't have the right fonts installed, but again, half our newcomer readers also won't?

Alex J. Best (Feb 18 2021 at 03:21):

That's very strange, they show as boxes for me too on Android (as does Mario's suggestion) but apart from the subscript I just copy pasted the usual norm bars from Kevin's post which do show on my phone. So I have no idea how those characters could have ended up different

Johan Commelin (Feb 18 2021 at 04:23):

will Lean4's parser be able to deal with the problem better? I got the impression that it's quite a bit more powerful. If so, I wouldn't refactor the notation just to switch back in half a year.

Sebastien Gouezel (Feb 18 2021 at 07:12):

I'm on the same line as Heather and Yury here. We know that the current notation works fine if we are ready to add parentheses when it doesn't :-) And it has the very valuable advantage of being self-explanatory. So I wouldn't change it.

Eric Wieser (Feb 18 2021 at 07:47):

As less offensive notation, could a subscript go on the closing ||?

Kevin Buzzard (Feb 18 2021 at 07:48):

Yes I also very much value the fact that right now it's readable and any change will make it less so. This is very much an area where the CS people don't seem to understand the value of this, I had some Twitter spat recently with a few people who argued that all unicode notation was bad because it just added an extra layer of unnecessary noise. Ok if it doesn't cause too many problems I think we should leave it. I'm still very much interested in the question about whether this issue can be fixed in lean 4

Eric Wieser (Feb 18 2021 at 07:49):

Like, perhaps xr\|x\|_r with r standing for R\mathbb{R}

Kevin Buzzard (Feb 18 2021 at 07:50):

This is the other option but we've just heard from the people who use this notation a lot that it doesn't seem to be a problem in practice

Sebastian Ullrich (Feb 18 2021 at 11:25):

Johan Commelin said:

will Lean4's parser be able to deal with the problem better? I got the impression that it's quite a bit more powerful. If so, I wouldn't refactor the notation just to switch back in half a year.

It turns out that this works out of the box in Lean 4.

axiom max {α} : α  α  α
axiom norm {α} : α  α
notation "∥" e "∥" => norm e

#check fun (x y : Nat) => x + y  max x y

I was a bit surprised by that, but the secret is that function application in Lean 4 has to include whitespace. So y∥ is not confused as an incomplete application.

Patrick Massot (Feb 18 2021 at 11:32):

This is somehow an accident though. It doesn't answer the question: could we assign different binding power to the opening and closing ? Or is it no longer a meaningful question?

Sebastian Ullrich (Feb 18 2021 at 11:44):

Correct, it is no longer meaningful because precedences are now associated with parsers, i.e. e and the whole notation. Nor would an extended Lean 3 model help: You have to tell the parser that y∥ is not a valid start point of the notation, but max ∥ is. That cannot be solved with precedences alone. The parser (or any such parser) is not even thinking about the precedence of the closing at this point.

Kevin Buzzard (Feb 18 2021 at 11:47):

This is good news, presumably? Are you saying that this whole "|":a e:b "|":c thing with three numbers was a hack all along? Does Lean 4 use a Pratt parser? My understanding of binding is based on this.

Sebastian Ullrich (Feb 18 2021 at 11:51):

It does, I've written a bit about it at https://leanprover.github.io/lean4/doc/syntax.html. Do tell me if that's intelligible for non-CS audiences :) .

Sebastian Ullrich (Feb 18 2021 at 11:53):

Kevin Buzzard said:

This is good news, presumably? Are you saying that this whole "|":a e:b "|":c thing with three numbers was a hack all along?

Well, now it's two numbers (the expanded form is notation:max "∥" e:0 "∥" => norm e), but at least they're now local to the notation. I'm not sure right now if the third number ever mattered in Lean 3 either.

Kevin Buzzard (Feb 18 2021 at 12:03):

While introducing new notations is a relatively rare feature in programming languages and sometimes even frowned upon because of its potential to obscure code, it is an invaluable tool in formalization for expressing established conventions and notations of the respective field succinctly in code.

Did you see the Twitter conversation I had about this? Andrej Bauer asked a maths question in Coq and my initial reaction was "that is so ugly to read, here is how it looks in Lean, mathematicians can read my version" and then several people popped up saying "unicode is just a stupid gimmick, it makes people's lives worse" and I was just dumbfouded. I will say it again -- on several occasions mathematicians have said to me that they are attracted to Lean because it looks more readable to them than the other languages. I think it is marvellous that we are going in this direction. The CS people were arguing that it adds an extra layer of obfuscation. What they don't seem to realise is that for mathematicians it is serious unobfuscation. It is a translation which is bridging the gap between our two worlds.

Eric Wieser (Feb 18 2021 at 12:05):

Heather Macbeth said:

Alex J. Best's suggestion just shows up to me as blank boxes ... maybe I don't have the right fonts installed, but again, half our newcomer readers also won't?

Mathlib is full of notation that doesn't render properly in zulip on android - even linear maps often don't. I don't think we should consider zulip android monospace font bugs in mathlibs design. Discounting the blank box complaint, do you still prefer (∥x + y∥) over ∥x + y∥ₙ?

Gabriel Ebner (Feb 18 2021 at 12:06):

Is that a zulip bug though? It seems to be limited to certain android versions. (IIRC you mentioned that you couldn't see the characters on github either.)

Kevin Buzzard (Feb 18 2021 at 12:09):

This is fixed in Lean 4 so for me the matter of whether we should change notation is now closed.

Sebastian Ullrich (Feb 18 2021 at 12:37):

Kevin Buzzard said:

Did you see the Twitter conversation I had about this?

I did, and I agree with you (modulo choice of words :) ). I'm pretty sure though that this wasn't "CS people" but (mostly) "Coq people". No-one coming from Agda would challenge use of Unicode, while in Isabelle they're not even content with Unicode but use their own extended markup, font, and text rendering (so if you want an ITP that actually allows you to put \N in the subscript...).

Mario Carneiro (Feb 18 2021 at 20:51):

Sebastian Ullrich said:

the secret is that function application in Lean 4 has to include whitespace. So y∥ is not confused as an incomplete application.

Does this mean that folks can't pretend lean has C style function call syntax anymore, e.g. f(x, y, z) (which works as long as f : A x B x C -> D)?

Mario Carneiro (Feb 18 2021 at 20:53):

I'm not sure how common this is, but I routinely see people posting lean 3 proofs using things like P(sin(x))

Sebastian Ullrich (Feb 18 2021 at 21:11):

Mario Carneiro said:

Sebastian Ullrich said:

the secret is that function application in Lean 4 has to include whitespace. So y∥ is not confused as an incomplete application.

Does this mean that folks can't pretend lean has C style function call syntax anymore, e.g. f(x, y, z) (which works as long as f : A x B x C -> D)?

Even better, they can now implement that syntax properly without clashing with the internal application syntax, and then reevaluate their life choices.


Last updated: Dec 20 2023 at 11:08 UTC