Zulip Chat Archive

Stream: lean4

Topic: invalid atom


Patrick Massot (Dec 30 2024 at 21:28):

I’m trying to bump the Lean version of Verbose Lean from 4.10.0 to 4.15.0-rc1 and the code

declare_syntax_cat providersDescr
syntax providersField := term ": [" ident,* "]"

now triggers an invalid atom error. Any idea what’s going on?

Edward van de Meent (Dec 30 2024 at 21:29):

quick fuzzing seems to suggest the issue is that there is a space in the string?

Edward van de Meent (Dec 30 2024 at 21:30):

perhaps a feasible workaround is instead using ":""["

Edward van de Meent (Dec 30 2024 at 21:31):

possibly along with some parsers making sure there is no linebreak inbetween

Patrick Massot (Dec 30 2024 at 21:31):

Is this a known regression? Or is there a good reason to give up this syntax?

Edward van de Meent (Dec 30 2024 at 21:38):

i can't find anything in the prospective release notes...

Yaël Dillies (Dec 30 2024 at 21:39):

Patrick, maybe out of scope for this discussion but why are you bumping five Lean versions at once?

Yaël Dillies (Dec 30 2024 at 21:40):

It might help here to know which release broke your syntax

Edward van de Meent (Dec 30 2024 at 21:40):

(it's the latest rc that breaks it)

Edward van de Meent (Dec 30 2024 at 21:40):

as evidenced by swapping between stable and latest mathlib on the playground

Edward van de Meent (Dec 30 2024 at 21:46):

the code responsible for atom validation is here...

Edward van de Meent (Dec 30 2024 at 21:50):

PR which added the breaking change: lean4#6012

Edward van de Meent (Dec 30 2024 at 21:52):

(CC @David Thrane Christiansen) i'm not sure why internal spaces are no longer allowed, it was not mentioned in the linked issue...

Patrick Massot (Dec 30 2024 at 21:58):

Thanks a lot for investigating this Edward! The PR message does say “Additionally, atoms with internal whitespace are no longer allowed.”

Edward van de Meent (Dec 30 2024 at 21:59):

it does, but no motivation... i'm reading the PR comments right now, there seems to be info there :looking:

Patrick Massot (Dec 30 2024 at 22:02):

Yes, there was a lot of discussion in the PR review.

Edward van de Meent (Dec 30 2024 at 22:02):

i'm guessing the intended fix is using term " : " "[" ident,* "]"

Edward van de Meent (Dec 30 2024 at 22:03):

making maximal use of prettyprinting hints

David Thrane Christiansen (Dec 30 2024 at 22:05):

I'm off on vacation and not near a computer - this is phone messaging. Sorry in advance for lack of links etc. The problem with spaces in atoms is that they are a footgun that makes the syntax less predictable - in this one place, the user must write exactly one space. Everywhere else in the language, two spaces or even a block comment would have been ok. And this was an easy mistake for the syntax author to make.

David Thrane Christiansen (Dec 30 2024 at 22:06):

For instance, open private was accidentally a syntax error for a long time

Edward van de Meent (Dec 30 2024 at 22:08):

thanks for clarifying!
it would be nice if this change is given a bit more attention in the proper release notes...
other than that, enjoy vacation!

David Thrane Christiansen (Dec 30 2024 at 22:10):

It seems reasonable to let edsl authors express this behavior, but I couldn't find a real example that wasn't actually a mistake. Is this an example where you intended to require that users write precisely one space character there?

Patrick Massot (Dec 30 2024 at 22:12):

No, I definitely didn’t intend to force exactly one space.

Patrick Massot (Dec 30 2024 at 22:12):

Could you confirm that the proper fix is to use term " : " "[" ident,* "]"?

Patrick Massot (Dec 30 2024 at 22:13):

I will have a lot of fixing to do with this change…

Patrick Massot (Dec 30 2024 at 22:14):

Because I also have a lot of things like:

elab "Let's" " first prove that " stmt:term : tactic =>
  anonymousSplitLemmaTac stmt

Patrick Massot (Dec 30 2024 at 22:15):

And there I’m not quite sure how to tell the pretty-printer where to put spaces (and it’s crucial that the pretty-printer knows this since it is used a lot in the help command and the widgets).

Edward van de Meent (Dec 30 2024 at 22:16):

elab "Let's ""first ""prove ""that " stmt:term : tactic =>
  anonymousSplitLemmaTac stmt

should work i think?

Patrick Massot (Dec 30 2024 at 22:17):

This is what I’m trying, although it doesn’t look pretty.

Patrick Massot (Dec 30 2024 at 22:18):

I guess I will soon write a macro/elaborator doing that for me. Because why couldn’t we take this as an opportunity for more fun with macros?

Patrick Massot (Dec 30 2024 at 22:20):

I think I fixed https://github.com/PatrickMassot/verbose-lean4/blob/master/Verbose/English/Lets.lean. Only twenty more files like this to go…

Patrick Massot (Dec 30 2024 at 22:21):

But I’ll get some sleep first.

David Thrane Christiansen (Dec 30 2024 at 22:27):

Yes, that's the intended fix (which will also allow users to use whitespace more flexibly in that context). Sorry to have made such a pile of work over the holidays - perhaps there's a regex find-replace that will do it for you?

Eric Wieser (Dec 30 2024 at 22:28):

Patrick Massot said:

I guess I will soon write a macro/elaborator doing that for me. Because why couldn’t we take this as an opportunity for more fun with macros?

Well, you're in danger of wanting to write a custom parser alias so that you can write withSpaces("foo bar") in your elab line, but initialize add_parser_alias was the very thing that has caused all of the Lean CI scripts to start crashing recently!

Edward van de Meent (Dec 30 2024 at 23:15):

fwiw, i've managed to write a function which splits a string into a list of strings by whitespace, such that if the string isn't only whitespace, folding it with appending will return the original string.

def makeAtomsOf' (s:String) : List String :=
  s.data.splitBy (!·.isWhitespace || ·.isWhitespace) |>.map String.mk

def makeAtomsOf (s:String) : List String := match makeAtomsOf' s with
  | [] => []
  | [a] => if a.all (·.isWhitespace) then [] else [a]
  | res@(a::b::s) => if a.all (·.isWhitespace) then ((a++b)::s)
    else res

#eval makeAtomsOf "   foo sorry    sofosrs sl aooj" -- ["   foo ", "sorry    ", "sofosrs ", "sl ", "aooj"]

Edward van de Meent (Dec 30 2024 at 23:16):

now to make a macro using this...

Edward van de Meent (Dec 30 2024 at 23:39):

apparently, it's hard to make a macro for something that doesn't have a syntaxnodekind.... which is unfortunate, in this case

Patrick Massot (Dec 31 2024 at 09:54):

I tried to play with register_parser_alias but I think this is above my pay grade. I’ll fix things by hand and see whether someone more competent with Lean parsing find this is an interesting challenge.

Patrick Massot (Dec 31 2024 at 09:57):

Actually I will probably backtrack and update only to lean4.14.0 which is hopefully enough.

Patrick Massot (Dec 31 2024 at 10:04):

Expect this plan means fighting lake :cry: Putting a version field in the lakefile.toml’s require section for mathlib and expect this would do anything was so naive…

Patrick Massot (Dec 31 2024 at 10:06):

I have

[[require]]
name = "mathlib"
git = "https://github.com/leanprover-community/mathlib4.git"
version = "v4.14.0"

in my lakefile and leanprover/lean4:v4.14.0 in my lean-toolchain and the first thing lake update tells me is

info: mathlib: checking out revision '5bf0f3993d81f682aab73c6129b25b49da7144fa'
info: updating toolchain to 'leanprover/lean4:v4.15.0-rc1'

Ruben Van de Velde (Dec 31 2024 at 10:08):

I think it's "rev", not "version"

Patrick Massot (Dec 31 2024 at 10:09):

Oh, I was not careful enough when reading https://github.com/leanprover/lean4/tree/master/src/lake#toml-require

Patrick Massot (Dec 31 2024 at 10:09):

Of course the syntax is not the same for reservoir require and git require, and there is no warning when using an unexpected toml field.

Patrick Massot (Dec 31 2024 at 10:10):

And indeed it seems to work with rev, thanks Ruben!

Patrick Massot (Dec 31 2024 at 11:13):

I filled lean4#6482

Kyle Miller (Jan 03 2025 at 17:25):

Here's a solution @Patrick Massot. For each syntax-defining command in Lean, you can copy the attribute command below to add a "molecule splitter" macro for it. The macro searches the entire command syntax for a Lean.Parser.Syntax.atom that's actually a "molecule" (for example " a b c ") and then replaces it with group(" a " "b " "c").

import Lean

open Lean

/--
Splits a "molecule" into atoms. For example,
```
splitMolecule "  a b  c " = #["  a ", "b  ", "c "]
```
-/
partial def splitMolecule (s : String) : Array String :=
  let it := s.mkIterator
  go #[] it (it.find (!·.isWhitespace))
where
  go (atoms : Array String) (left right : String.Iterator) : Array String :=
    let right := right |>.find (·.isWhitespace) |>.find (!·.isWhitespace)
    if left == right then
      atoms
    else
      let atoms := atoms.push (left.extract right)
      go atoms right right

def isStxMolecule (p : Syntax) : Bool :=
  p.isOfKind ``Lean.Parser.Syntax.atom
    && if let some atom := p[0].isStrLit? then atom.trim.any Char.isWhitespace else false

def expandStxMolecules? (s : Syntax) : MacroM (Option Syntax) := do
  unless (s.find? isStxMolecule).isSome do
    return none
  s.replaceM fun p => do
    if isStxMolecule p then
      if let some s := p[0].isStrLit? then
        withRef p do
          let atomStrings := splitMolecule s
          let atoms  atomStrings.mapM fun atomString => `(stx| $(quote atomString):str)
          `(stx| group($[$atoms]*))
      else
        Macro.throwUnsupported
    else
      return none

def expandStxMolecules : Lean.Macro := fun s => do
  ( expandStxMolecules? s).getDM Macro.throwUnsupported

attribute [macro Lean.Parser.Command.syntax] expandStxMolecules
attribute [macro Lean.Parser.Command.syntaxAbbrev] expandStxMolecules

syntax term " : [" term "]" : term
#check 2 :   [ 3 ]
/-
elaboration function for '«term_:[_]»' has not been implemented
  2 : [3]
-/
macro_rules
  | `($a : [ $b ]) => `([$a, $b])
#check 2 :   [ 3 ]
-- [2, 3] : List Nat

syntax s := term " => [" term "]"
#check `(s| 2 => [ 3 ])

Patrick Massot (Jan 03 2025 at 17:37):

This looks really nice, thank you very much!

Patrick Massot (Jan 08 2025 at 08:56):

@Kyle Miller, thanks to Kim’s work last night, I’m able to really test your molecule splitter. It works but it creates a lot of new keywords. In particular I have now summoned the a bug from the dead. Do you see any way to avoid that?

Johan Commelin (Jan 08 2025 at 09:00):

In French you write funny accents on top of the "a" anyways, right?

David Thrane Christiansen (Jan 08 2025 at 09:02):

I'm taking a look at what kind of refactoring is needed in verbose-lean4 to account for all this. Non-reserved keywords could be used here for most of it, I believe

Patrick Massot (Jan 08 2025 at 09:05):

The a bug is indeed resurrected only for English-speaking users.

Patrick Massot (Jan 08 2025 at 09:05):

Thanks David!

David Thrane Christiansen (Jan 08 2025 at 09:05):

Sorry for having made more work for you here...

Patrick Massot (Jan 08 2025 at 09:06):

My hope is that Kyle’s molecule splitting magic function can be tweaked to avoid that.

Patrick Massot (Jan 08 2025 at 09:07):

You can see it also poses issues like at https://github.com/PatrickMassot/verbose-lean4/commit/de5761cefee99be45501d489db28302e786c2d8e#diff-afbd756aded38e92e4b93f913ecc558c3c56ee8ebb1b7880a42677e9b75d50ebL13-R15

Patrick Massot (Jan 08 2025 at 09:08):

where notation3 u " is increasing" => increasing u had to be changed to notation3 u " is increasing" => increasing_seq u after renaming the increasing declaration because the molecule splitted is increasing was introducing an increasing keyword.

Sebastian Ullrich (Jan 08 2025 at 09:10):

The first word after a term/... should be a reserved keyword because we need it to tell the term parser to stop there but the remainder could be non-reserved, yes

Patrick Massot (Jan 08 2025 at 09:11):

Do you see a way to fix #lean4 > invalid atom @ 💬

David Thrane Christiansen (Jan 08 2025 at 09:11):

Yes indeed, I'm experimenting with it right now :)

David Thrane Christiansen (Jan 08 2025 at 09:17):

This quick hack seems to do the trick:

import Lean

open Lean

/--
Splits a "molecule" into atoms. For example,
```
splitMolecule "  a b  c " = #["  a ", "b  ", "c "]
```
-/
partial def splitMolecule (s : String) : Array String :=
  let it := s.mkIterator
  go #[] it (it.find (!·.isWhitespace))
where
  go (atoms : Array String) (left right : String.Iterator) : Array String :=
    let right := right |>.find (·.isWhitespace) |>.find (!·.isWhitespace)
    if left == right then
      atoms
    else
      let atoms := atoms.push (left.extract right)
      go atoms right right

def isStxMolecule (p : Syntax) : Bool :=
  p.isOfKind ``Lean.Parser.Syntax.atom
    && if let some atom := p[0].isStrLit? then atom.trim.any Char.isWhitespace else false



def expandStxMolecules? (s : Syntax) : MacroM (Option Syntax) := do
  unless (s.find? isStxMolecule).isSome do
    return none
  s.replaceM fun p => do
    if isStxMolecule p then
      if let some s := p[0].isStrLit? then
        withRef p do
          let atomStrings := splitMolecule s
          if h : atomStrings.size > 0 then
            let firstAtom  `(stx|$(quote atomStrings[0]):str)
            let restAtoms  (atomStrings.extract 1 atomStrings.size).mapM fun atomString =>
              if atomString.all (fun c => c.isAlpha || c == ' ') then
                `(stx| &$(quote atomString):str)
              else `(stx| $(quote atomString):str)
            `(stx| group($firstAtom $[$restAtoms]*))
          else return none
      else
        Macro.throwUnsupported
    else
      return none

def expandStxMolecules : Lean.Macro := fun s => do
  ( expandStxMolecules? s).getDM Macro.throwUnsupported

attribute [macro Lean.Parser.Command.syntax] expandStxMolecules
attribute [macro Lean.Parser.Command.syntaxAbbrev] expandStxMolecules

syntax term " : [" term "]" : term
#check 2 :   [ 3 ]
/-
elaboration function for '«term_:[_]»' has not been implemented
  2 : [3]
-/
macro_rules
  | `($a : [ $b ]) => `([$a, $b])
#check 2 :   [ 3 ]
-- [2, 3] : List Nat

syntax s := term " => [" term "]"
#check `(s| 2 => [ 3 ])

syntax "Here's a great " term:max " of type " term:max : term
macro_rules
  | `(Here's a great $tm:term of type $ty:term) => `(show $ty from $tm)

-- This works:
def a := Here's a great 33 of type Nat

David Thrane Christiansen (Jan 08 2025 at 09:18):

It just reserves the first keyword in a "molecule". This is may not be specific enough, but it should allow for some progress.

Sebastian Ullrich (Jan 08 2025 at 09:20):

@Patrick Massot Note that this change should also provide a (minor/major?) UX improvement: on partial input such as f is, I assume you previously got an error message such as "unknown identifier 'is'", while now it should list exactly the words that may follow is

Patrick Massot (Jan 08 2025 at 09:22):

Sebastian, I’m well aware that this whole change is globally a good idea. The fact that the old style was enforcing having exactly one space between words was also very bad (I never noticed it but I’m sure students would have find it since they have a magical ability to trigger all possible issues).

David Thrane Christiansen (Jan 08 2025 at 09:22):

In particular, there are still opportunities for ambiguity with this version:

syntax "Here's a great " term " of type " term : term
macro_rules
  | `(Here's a great $tm:term of type $ty:term) => `(show $ty from $tm)

syntax "Here's a " term " of type " term : term
macro_rules
  | `(Here's a $tm:term of type $ty:term) => `(show $ty from $tm)

def great := @id

/--
error: ambiguous, possible interpretations ⏎
  let_fun this := great 33;
  this : Nat

  let_fun this := 33;
  this : Nat
-/
#guard_msgs in
def a := Here's a great 33 of type Nat

David Thrane Christiansen (Jan 08 2025 at 09:26):

I can confirm that the parser errors here are quite civilized:

#eval Here's a great 33 of

reports

unexpected end of input; expected 'type'

Patrick Massot (Jan 08 2025 at 09:31):

It creates issues elsewhere unfortunately.

David Thrane Christiansen (Jan 08 2025 at 09:34):

Are they ambiguity issues? Based on my experiments this morning, I think there's some refactoring needed to get the right precedence annotations in and avoid ambiguous parses. It's doable but not so fun. I'll look at adding a backwards compat option for this change to make it into a non-emergency situation for you.

Patrick Massot (Jan 08 2025 at 09:34):

And the error message is the dreaded unexpected syntax which I’ve never found very helpful.

Patrick Massot (Jan 08 2025 at 09:35):

It’s not really an emergency issue because, thanks to Kim’s work last night, I can rather easily teach on Lean 4.14.0 this year. But this is not a future proof solution.

Patrick Massot (Jan 08 2025 at 09:35):

If you want to help even more then you can try your fix on verbose Lean and see if you can get the library to build.

Patrick Massot (Jan 08 2025 at 09:36):

The molecule code is in https://github.com/PatrickMassot/verbose-lean4/blob/master/Verbose/Tactics/Common.lean

Patrick Massot (Jan 08 2025 at 09:36):

And it breaks https://github.com/PatrickMassot/verbose-lean4/blob/master/Verbose/French/Calc.lean

David Thrane Christiansen (Jan 08 2025 at 10:19):

I had a quick house thing to take care of. Looking at the files, it doesn't seem that they're broken anymore

Patrick Massot (Jan 08 2025 at 10:40):

Very mysterious. In the mean time I moved to another computer and it also seems to work.

David Thrane Christiansen (Jan 08 2025 at 10:45):

Well then, good to hear!

David Thrane Christiansen (Jan 08 2025 at 13:21):

So, just to confirm: the backwards-compatiblity option is no longer needed, because verbose-lean4 works now, right?

Patrick Massot (Jan 08 2025 at 13:40):

Yes, it seems to work now. Thank you very much to all people who helped.

Patrick Massot (Jan 10 2025 at 21:03):

Actually the white space trick seems to break the info view, as in:

import Lean

section
open Lean Parser Tactic Meta Elab Tactic Option

/--
Splits a "molecule" into atoms. For example,
`splitMolecule "  a b  c " = #["  a ", "b  ", "c "]`
-/
partial def splitMolecule (s : String) : Array String :=
  let it := s.mkIterator
  go #[] it (it.find (!·.isWhitespace))
where
  go (atoms : Array String) (left right : String.Iterator) : Array String :=
    let right := right |>.find (·.isWhitespace) |>.find (!·.isWhitespace)
    if left == right then
      atoms
    else
      let atoms := atoms.push (left.extract right)
      go atoms right right

def isStxMolecule (p : Syntax) : Bool :=
  p.isOfKind ``Lean.Parser.Syntax.atom
    && if let some atom := p[0].isStrLit? then atom.trim.any Char.isWhitespace else false

def expandStxMolecules? (s : Syntax) : MacroM (Option Syntax) := do
  unless (s.find? isStxMolecule).isSome do
    return none
  s.replaceM fun p => do
    if isStxMolecule p then
      if let some s := p[0].isStrLit? then
        withRef p do
          let atomStrings := splitMolecule s
          let atoms  atomStrings.mapM fun atomString => `(stx| $(quote atomString):str)
          `(stx| group($[$atoms]*))
      else
        Macro.throwUnsupported
    else
      return none

def expandStxMolecules : Lean.Macro := fun s => do
  ( expandStxMolecules? s).getDM Macro.throwUnsupported

attribute [macro Lean.Parser.Command.syntax] expandStxMolecules
attribute [macro Lean.Parser.Command.syntaxAbbrev] expandStxMolecules

end

def even (f : Int  Int) :=  x, f (-x) = f x

notation:50 f:80 " is even " => even f

example (f g : Int  Int) (hf : f is even) : g (f (-5)) = g (f 5) := by
  have hf5 : f (-5) = f 5 := hf 5
  rw [hf5]

Patrick Massot (Jan 10 2025 at 21:04):

It seems Lean tries to display the notation in the infoview and it goes very wrong.

Patrick Massot (Jan 10 2025 at 21:04):

By the way, @Julian Berman this is a case where lean.nvim simply displays nothing at all while VSCode shows an error message.

Patrick Massot (Jan 10 2025 at 21:05):

The error message is: “Error updating: Error fetching goals: Rpc error: InternalError: parenthesize: uncaught backtrack exception. Try again.

Kyle Miller (Jan 10 2025 at 21:08):

I was looking into fixing this earlier. The current "molecule splitter" works by creating a group(...) node with the atomized molecule. Somehow either the default parenthesizer can't handle groups, or this code is accidentally breaking some assumption for how notation is supposed to work.

The expandStxMolecules? function could instead splice-in the atoms into the surrounding syntax. This could break things in some contexts, but for syntax/notation it should be fine...

Patrick Massot (Jan 10 2025 at 21:11):

Maybe this requires @Sebastian Ullrich’s input then?

Kyle Miller (Jan 10 2025 at 21:31):

It's a matter of some more macros @Patrick Massot

import Lean

section
open Lean Parser Tactic Meta Elab Tactic Option

/--
Splits a "molecule" into atoms. For example,
`splitMolecule "  a b  c " = #["  a ", "b  ", "c "]`
-/
partial def splitMolecule (s : String) : Array String :=
  let it := s.mkIterator
  go #[] it (it.find (!·.isWhitespace))
where
  go (atoms : Array String) (left right : String.Iterator) : Array String :=
    let right := right |>.find (·.isWhitespace) |>.find (!·.isWhitespace)
    if left == right then
      atoms
    else
      let atoms := atoms.push (left.extract right)
      go atoms right right

def isStxMolecule (p : Syntax) : Bool :=
  p.isOfKind ``Lean.Parser.Syntax.atom
    && if let some atom := p[0].isStrLit? then atom.trim.any Char.isWhitespace else false

def expandStxMolecules? (s : Syntax) : MacroM (Option Syntax) := do
  unless (s.find? isStxMolecule).isSome do
    return none
  s.replaceM fun p => do
    if isStxMolecule p then
      if let some s := p[0].isStrLit? then
        withRef p do
          let atomStrings := splitMolecule s
          let atoms  atomStrings.mapM fun atomString => `(stx| $(quote atomString):str)
          `(stx| group($[$atoms]*))
      else
        Macro.throwUnsupported
    else
      return none

def expandStxMolecules : Lean.Macro := fun s => do
  ( expandStxMolecules? s).getDM Macro.throwUnsupported

attribute [macro Lean.Parser.Command.syntax] expandStxMolecules
attribute [macro Lean.Parser.Command.syntaxAbbrev] expandStxMolecules

def isNotationItemMolecule (p : Syntax) : Bool :=
  if let some atom := p.isStrLit? then atom.trim.any Char.isWhitespace else false

/-
@[builtin_command_parser] def «notation»    := leading_parser
  optional docComment >> optional Term.«attributes» >> Term.attrKind >>
  "notation" >> optPrecedence >> optNamedName >> optNamedPrio >> many notationItem >> darrow >> termParser

item 7 is the `many notationItem`
-/
def expandNotationMolecules : Lean.Macro := fun s => do
  let items := s[7].getArgs
  unless items.any isNotationItemMolecule do
    Macro.throwUnsupported
  let mut items' : Array Syntax := #[]
  for item in items do
    if let some s := item.isStrLit? then
      for atom in splitMolecule s do
        items' := items'.push <|  withRef item `(Command.notationItem| $(quote atom):str)
    else
      items' := items'.push item
  return s.setArg 7 (mkNullNode items')

attribute [macro Lean.Parser.Command.notation] expandNotationMolecules

end

def even (f : Int  Int) :=  x, f (-x) = f x

notation:50 f:80 " is even " => even f

example (f g : Int  Int) (hf : f is even) : g (f (-5)) = g (f 5) := by
  have hf5 : f (-5) = f 5 := hf 5
  /-
  f g : Int → Int
  hf : f is even
  hf5 : f (-5) = f 5
  ⊢ g (f (-5)) = g (f 5)
  -/
  rw [hf5]

Patrick Massot (Jan 10 2025 at 22:42):

Fantastic, thank you very much Kyle! I’m very happy since I grew quite fond of tactic states over the years.

Kyle Miller (Jan 10 2025 at 22:45):

Yeah, tactic states are a real nice-to-have :-)

(By the way, this isn't including David's improvement from #lean4 > invalid atom @ 💬, which marks some of the atoms with &)

Patrick Massot (Jan 10 2025 at 23:02):

Kyle Miller said:

(By the way, this isn't including David's improvement from #lean4 > invalid atom @ 💬, which marks some of the atoms with &)

This is super weird. I guess I messed up with git because I’m sure I incorporated those changes at some point.

Julian Berman (Jan 10 2025 at 23:03):

Patrick Massot said:

By the way, Julian Berman this is a case where lean.nvim simply displays nothing at all while VSCode shows an error message.

oh fun, I've never seen that before (in VSCode), that's not a diagnostic it's the extension obviously saying that talking to the language server failed quite hard. We catch that (and I think log it) but don't show those in the infoview. Do you want to (== think everyone should) see that in the infoview or is a logging message enough if I make it visible?

Patrick Massot (Jan 10 2025 at 23:03):

Oh no I see. I told David at that time that his fix broke other stuff and he couldn’t reproduce. But that’s precisely because I forgot to push. Now it’s breaking things again :sad:

Patrick Massot (Jan 10 2025 at 23:05):

Julian, I think the totally blank infoview panel is not great in this case because there is no hint about what’s going on. Of course the error message is completely unusable but at least it says Lean is not dead.

Patrick Massot (Jan 10 2025 at 23:06):

This time I pushed David’s code to https://github.com/PatrickMassot/verbose-lean4/tree/david_fix, and indeed the lib doesn’t build with it.

David Thrane Christiansen (Jan 11 2025 at 00:21):

It's too late for me to look tonight, but I'll take a look as soon as I can

David Thrane Christiansen (Jan 11 2025 at 19:08):

OK, I just created a PR against that branch that makes it build for me. I got rid of all the auto-splitting, and just went in and split the atoms manually in all the files. I set a to be a non-reserved keyword, but there may be others that should be it. The library builds, and clicking around in the demos does what I expect, but I don't know it well. Please let me know if there's something else to look at!

Patrick Massot (Jan 11 2025 at 19:11):

This is very kind, but it doesn’t seem really future proof. Are you really convinced there is no way to automate this and people should simply try putting & by hand until the parser is happy?

David Thrane Christiansen (Jan 11 2025 at 19:22):

I'd rather suggest using &selectively on atoms that should remain valid as identifiers, like a. In cases where this leads to ambiguous parses, I'd suggest refactoring the grammar if possible, or using precedence annotations to remove the ambiguity.

David Thrane Christiansen (Jan 11 2025 at 19:26):

Any automated solution would need something like this too, really - ambiguity is a property of the resulting syntax. But it should also be possible to whitelist a set of atoms to not reserve, rather than doing it positionally like in my change to Kyle's macro. That should give a result equivalent to the changes I implemented in the branch, but it was unclear to me that this actually made the library easier to develop. I could try that approach too if you'd like to see it.

Patrick Massot (Jan 11 2025 at 19:50):

How did you choose to put & in your PR?

David Thrane Christiansen (Jan 11 2025 at 19:52):

I put it before tokens that seemed likely to be used as identifiers elsewhere.

David Thrane Christiansen (Jan 11 2025 at 19:54):

In practice, this set will likely need some adjustment, because my knowledge is far from global!

David Thrane Christiansen (Jan 11 2025 at 19:55):

If you have a corpus of assignments and solutions using this that you can send me in private, I'll give it another pass to make sure they do what they should

Patrick Massot (Jan 11 2025 at 20:14):

Thanks but I really want to avoid a solution where molecules are split by hand. I’ll try to come up with a list of atoms to decorate with & and see if this is good enough.

David Thrane Christiansen (Jan 11 2025 at 20:16):

Ok, when I get the chance I'll make you another version of the macro

Patrick Massot (Jan 11 2025 at 20:18):

I’ll try to do it tonight.

David Thrane Christiansen (Jan 11 2025 at 20:20):

No rush - I can tweak the macro with just a guess of what should be in it, and you can adjust as needed

David Thrane Christiansen (Jan 11 2025 at 21:26):

OK, I just pushed an alternative PR that builds locally, and in which the infoview works. This one has an explicit list of tokens to not reserve when splitting; it doesn't allow them to be the first (because of ambiguity issues). I hope this works well for you.

David Thrane Christiansen (Jan 11 2025 at 21:30):

When things settle down, I'd like to hear more about why writing multiple atoms in a row is undesirable. I'd assumed it was that you didn't have time/energy to do the splitting, which is why I just went and did it, but now it sounds like it's something else too.

If it were my project, then I probably wouldn't consider the the kind of metaprogramming done to do automatic atom splitting here to be worth the complexity cost in the code, but perhaps there are further reasons to do so that I'm not aware of?

David Thrane Christiansen (Jan 11 2025 at 21:30):

In any case, let me know how it goes!

Patrick Massot (Jan 11 2025 at 21:35):

Thanks a lot!

Patrick Massot (Jan 11 2025 at 21:35):

Does the full library builds with this?

Patrick Massot (Jan 11 2025 at 21:36):

Is there a reason to turn notation3 into notation? Note I didn’t have any specific reason to use notation3, it was simply because in my mind it means a better version of notation.

David Thrane Christiansen (Jan 11 2025 at 21:37):

No prob :-)

It does this:

$ lake build
Build completed successfully.

David Thrane Christiansen (Jan 11 2025 at 21:38):

I switched notation commands because @Kyle Miller's infoview fix was for the built-in notation command rather than notation3. The infoview was still throwing parenthesizer errors without it.

Patrick Massot (Jan 11 2025 at 21:39):

David Thrane Christiansen said:

When things settle down, I'd like to hear more about why writing multiple atoms in a row is undesirable. I'd assumed it was that you didn't have time/energy to do the splitting, which is why I just went and did it, but now it sounds like it's something else too.

The only reason is I still hope some people will create translations to different languages (I’ve heard many people claiming this, but it happened only once and the translation is not maintained). I don’t want to scare those people.

David Thrane Christiansen (Jan 11 2025 at 21:40):

That makes sense. Would a code action/quickfix to do the split in the source file address this concern?

Patrick Massot (Jan 11 2025 at 21:41):

Probably. It’s difficult to be sure.

Kyle Miller (Jan 11 2025 at 21:45):

The notation3 version of the macro should be just like expandNotationMolecules, but you change the 7s to 8s

Patrick Massot (Jan 11 2025 at 21:52):

Do you know what should go in attribute [macro Lean.Parser.Command.notation] expandNotationMolecules ?

Kyle Miller (Jan 11 2025 at 22:00):

attribute [macro Mathlib.Notation3.notation3] expandNotation3Molecules

Patrick Massot (Jan 11 2025 at 22:25):

It doesn’t seem to work.

Patrick Massot (Jan 11 2025 at 22:28):

Maybe I misunderstood. I tried

def expandNotation3Molecules : Lean.Macro := fun s => do
  let items := s[8].getArgs
  unless items.any isNotationItemMolecule do
    Macro.throwUnsupported
  let mut items' : Array Syntax := #[]
  for item in items do
    if let some s := item.isStrLit? then
      for atom in splitMolecule s do
        items' := items'.push <|  withRef item `(Command.notationItem| $(quote atom):str)
    else
      items' := items'.push item
  return s.setArg 8 (mkNullNode items')

attribute [macro Mathlib.Notation3.notation3] expandNotation3Molecules

Kyle Miller (Jan 11 2025 at 22:40):

There turned out to be some subtle differences in the items. This one's been tested a little:

def isNotation3ItemMolecule (p : Syntax) : Bool :=
  if let some atom := p[0].isStrLit? then atom.trim.any Char.isWhitespace else false

def expandNotation3Molecules : Lean.Macro := fun s => do
  let items := s[8].getArgs
  unless items.any isNotation3ItemMolecule do
    Macro.throwUnsupported
  let mut items' : Array Syntax := #[]
  for item in items do
    if let some s := item[0].isStrLit? then
      for atom in splitMolecule s do
        items' := items'.push <|  withRef item
          `(Mathlib.Notation3.notation3Item| $(quote atom):str)
    else
      items' := items'.push item
  return s.setArg 8 (mkNullNode items')

attribute [macro Mathlib.Notation3.notation3] expandNotation3Molecules

Patrick Massot (Jan 11 2025 at 22:45):

It seems to work much better, thanks!


Last updated: May 02 2025 at 03:31 UTC