Zulip Chat Archive

Stream: lean4

Topic: converting between `Lean.Position` and... something else!


Damiano Testa (May 29 2024 at 06:44):

I am trying to performed "aimed substitutions", where Lean reports the location of some syntax to be changed and some other script (in bash/python/...) actually performs the substitutions.

The main stumbling point, though, is that I can easily get a line number that I can work with, but not a usable column number. The underlying issue is that I do not know how to use the character count that Lean (and VSCode) uses with what sed/awk/python use.

Does anyone know how to either get Lean to report something that can be used by another program or how to use what Lean natively reports elsewhere?

Thanks!

Damiano Testa (May 29 2024 at 06:46):

For instance, the angle bracket counts as 1 character for Lean, but more than 1 for everything else that I tried.

Eric Wieser (May 29 2024 at 06:49):

It should be one character in Python too

Eric Wieser (May 29 2024 at 06:51):

The keyword distinction here is unicode codepoint index vs utf-8-encoded byte offset

Eric Wieser (May 29 2024 at 06:51):

I think docs#Lean.FileMap does what you want

Damiano Testa (May 29 2024 at 06:52):

I have tried with that, but I am not sure that every character is encoded in the same way. I'll try to find an example, though probably this afternoon.

Damiano Testa (May 29 2024 at 06:52):

The filemap converts between a single number to a pair (line, column), right?

Eric Wieser (May 29 2024 at 06:52):

Or docs#Lean.FileMap.ofPosition to go in the other direction

Damiano Testa (May 29 2024 at 06:53):

However, I think that it does not actually change the "column" count, it only tells you how far you are from a line break, with the same measure for individual characters.

Damiano Testa (May 29 2024 at 06:53):

Ok, I'll play a bit more with those, but I have not had success earlier.

Eric Wieser (May 29 2024 at 06:54):

The value returned by Lean.FileMap.ofPosition is a byte offset, so to use it in Python you will need to use open(..., 'rb')

Damiano Testa (May 29 2024 at 06:55):

Ah, 'rb' was not suggested by chatGPT! I had 'r'!

Damiano Testa (May 29 2024 at 06:55):

Thanks, I will try it and will report if I still have problems!

Eric Wieser (May 29 2024 at 06:57):

Why not do the replacement on the Lean side?

Damiano Testa (May 29 2024 at 06:58):

Because the reformatting makes it very hard to "just add a ? before this _.

Damiano Testa (May 29 2024 at 06:58):

It will add line breaks everywhere.

Damiano Testa (May 29 2024 at 06:59):

(Or, I do not know how to extract the source string for the syntax that Lean is parsing.)

Eric Wieser (May 29 2024 at 06:59):

You could read the source file by hand in the same way you are doing in Python

Damiano Testa (May 29 2024 at 07:00):

True, I like this better!

Richard Copley (May 29 2024 at 07:27):

Eric Wieser said:

The value returned by Lean.FileMap.ofPosition is a byte offset, so to use it in Python you will need to use open(..., 'rb')

Is this the kind of "position" that is sent to LSP clients in diagnostics and so on?
If so column is a count of UTF-16 code units (not bytes, not characters, not glyphs, not columns, not Unicode code points). Very awkward, but it is what the LSP spec originally required. You will need Unicode functions.

Richard Copley (May 29 2024 at 07:41):

No! Lean.FileMap.ofPosition is a String.Pos, which should be a UTF-8 offset (a byte, like you said, assuming the file is encoded in UTF-8 as it should be). The UTF-16 things are Lean.Lsp.Position.

Damiano Testa (May 29 2024 at 11:16):

Thank you both, I had a quick look and it seems that Eric's suggestion of working directly in Lean works great!

This way, I do not have to worry about conversions: I just work with what Lean gives.

Damiano Testa (May 29 2024 at 20:00):

Mostly for my future reference, here is one of the main traps in which I fell -- over and over again:

import Lean.Elab.Command

run_cmd do
  let test := "⟨⟩"
  for i in [1, 2] do
    let shortTest := test.take i
    if shortTest.toList == (List.range shortTest.length).map (shortTest.get ⟨·⟩)
    then dbg_trace "equal"
    else dbg_trace "different"

Eric Wieser (May 29 2024 at 22:17):

Yes, to convert a nat to a codepoint position within s you probably want (s.iter.nextn n).pos

Eric Wieser (May 29 2024 at 22:18):

Using String.Pos.mk, as you do above, is the wrong choice for sure

Damiano Testa (May 29 2024 at 22:24):

Thank you for the suggestion to use Lean directly: the script performed about 1000 surgical replacements and (possibly with one exception) wrote correct Lean code!

Damiano Testa (May 29 2024 at 22:24):

#13357 is the corresponding PR.


Last updated: May 02 2025 at 03:31 UTC