Zulip Chat Archive

Stream: lean4

Topic: Using `IO.Ref`

Damiano Testa (Aug 13 2024 at 11:13):

I would like to confirm that I am not doing anything wrong when working with IO.Ref: my understanding is very superficial!

To show my issue, there is the need for two files: one defining (initializing) the IO.Ref and one reading it.
File1:

import Lean.Elab.Command

initialize myRef : IO.Ref Bool ← IO.mkRef true

/-- checks whether the counter is `true` and, if so, then sets it to `false`.
It also prints a summary of what it is doing. -/
elab "to_false" : command => do
  if ← myRef.get then
    Lean.logInfo "Counter was `true`, setting it to `false`"
    myRef.set false

File2:

import File1

to_false -- Counter was `true`, setting it to `false`

If you have this setup, in File2 you will see the message that the command emits.

However, with essentialy whatever you type (adding a line-break, a new command, a space, virtually anything), the message disappears and will not be back until you reload the file.

Damiano Testa (Aug 13 2024 at 11:14):

I can guess that, once the Ref is set to false, it does not get "reset" with modifications of the file. Is there a way around this, ensuring that the message on the first to_false stays there if I modify the file?

Sebastian Ullrich (Aug 13 2024 at 11:15):

That is the purpose of environment extensions

Damiano Testa (Aug 13 2024 at 11:16):

So you are saying that I should be using an environment extension instead?

Damiano Testa (Aug 13 2024 at 11:16):

(I have never worked with them, so I really do not know what they are or how they work.)

Damiano Testa (Aug 13 2024 at 11:19):

Actually, the reason I was playing with this is for use in a linter: would an environment extension be something that a linter can modify?

Damiano Testa (Aug 13 2024 at 11:33):

Updated example:
File1:

import Lean.Elab.Command

open Lean

structure counter where
  f : Bool
  deriving Inhabited

initialize myCounter : Lean.EnvExtension counter ← Lean.registerEnvExtension (pure ⟨true⟩)

/-- checks whether the `EnvExtension` is `true` and, if so, then sets it to `false`.
It also prints a summary of what it is doing. -/
elab "EnvExt_false" : command => do
  if (myCounter.getState (← getEnv)).f then
    Lean.logInfo "EnvExt was `true`, setting it to `false`"
    setEnv <| myCounter.setState (← getEnv) ⟨false⟩

File2

import File1

EnvExt_false -- EnvExt was `true`, setting it to `false`

and EnvExt_false does not stop printing its message when I edit the file!

Damiano Testa (Aug 13 2024 at 11:34):

The next step is to check whether this will work with a linter.

Damiano Testa (Aug 13 2024 at 11:40):

Unfortunately, I do not seem to be able to modify the EnvExtension from within a linter.

Matthew Ballard (Aug 13 2024 at 11:52):

I think you need to write to the environment during the build of the oleans. The most common way this is done is via attribute tagging. I am not sure how useful it is to your use case but it gives examples of use.

Damiano Testa (Aug 13 2024 at 11:53):

Can you give me an example of something that does that?

Matthew Ballard (Aug 13 2024 at 12:00):

I’m away from my computer but rg 'addEntry' Mathlib/Tactic should provide many

Damiano Testa (Aug 13 2024 at 12:01):

Ok, thanks! This is probably enough: I just wanted to find an entry-point!

Matthew Ballard (Aug 13 2024 at 12:12):

You probably want a PersistentEnvExtension also

Damiano Testa (Aug 13 2024 at 12:13):

I am probably making things too complicated: all that I want is to pass the knowledge to the linter of whether it has already processed a command in a file or not.

Damiano Testa (Aug 13 2024 at 12:13):

(Note that import statements are not "seen" by the linter, so "the first command" really means "the first non-import command", but that is by default and also what I want!)

Matthew Ballard (Aug 13 2024 at 12:14):

Processed in a separate file?

Damiano Testa (Aug 13 2024 at 12:15):

I am trying to check whether each linted file starts with a doc-module string and report it if it does not.

Damiano Testa (Aug 13 2024 at 12:15):

So, every file that imports the linter will be processed.

Damiano Testa (Aug 13 2024 at 12:16):

If I understood your question correctly, then yes: the linter is in one file and every downstream file from there is where I want the check to happen.

Damiano Testa (Aug 13 2024 at 12:29):

Reading the docs, I think that PersistentEnvExtensions may not be what I want: if I understand correctly, the Persistent part is that it passes on information to the downstream modules. While I do not really care about that information, I just want to perform a check on each module separately.

Matthew Ballard (Aug 13 2024 at 12:34):

Sorry, I’m not exactly sure what is wanted. It seemed from your example that you wanted to persist some state from one file to the next.

Damiano Testa (Aug 13 2024 at 12:36):

Ah no, I wanted the state of each file to change upon reaching some command. The example above worked as intended for the actual command EnvExt_false, but linters do not seem to be able to globally modify environment extensions.

Damiano Testa (Aug 13 2024 at 12:36):

So, it appears that within the linter, when it is processing a command, I can modify the envExtension, but as soon as the linter releases the current command, the environment switches back to what it was before the linter modified it.

Damiano Testa (Aug 13 2024 at 12:37):

I suspect that linters run in a withoutModifyingEnv wrapper, so I am really not sure what to do.

Damiano Testa (Aug 13 2024 at 12:37):

Their modifications of IO.Refs do work, but the IO.Refs are flimsy and need to reload the file to be reliably used.

Sebastian Ullrich (Aug 13 2024 at 12:43):

I think what you actually want is for a linter to be able to access any syntax tree/snapshot up to the current command, which is a request that has come up before

Sebastian Ullrich (Aug 13 2024 at 12:44):

Then you don't need to carry any state yourself, which is good because we want linters to run in parallel to elaboration

Damiano Testa (Aug 13 2024 at 12:44):

That would certainly be enough! I think that for this specific case, a much smaller amount of information would be already sufficient!

Damiano Testa (Aug 13 2024 at 12:45):

While we are here, would it be possible to have an option to make the linters run before the corresponding the command that they are linting?

Sebastian Ullrich (Aug 13 2024 at 12:46):

This is a bit of speculation at this point but if we give linters access to the snapshot of the command, we could have them decide for themselves if they want to wait for the snapshot after elaboration or not

Damiano Testa (Aug 13 2024 at 12:47):

Ok, I have wanted the option of choosing to run a linter before its command very often.

Damiano Testa (Aug 13 2024 at 12:47):

The typical case is that you modify the command, say by replacing a proof by sorry and you want to see if it still compiles, up to the sorry message.

Damiano Testa (Aug 13 2024 at 12:48):

Currently, you have to find a way to make the name not clash with the one of the current declaration.

Damiano Testa (Aug 13 2024 at 12:48):

If you could run the linter before the command, you could simply make it not modify the environment and be done!

Sebastian Ullrich (Aug 13 2024 at 12:49):

I'm not sure I see how that wouldn't utterly tank performance unless you're extremely careful about what commands you want to modify and test

Damiano Testa (Aug 13 2024 at 12:50):

These are more "diagnostic" linters: I do not intend to run them everywhere, but either every once in a while, or just on some files.

Damiano Testa (Aug 13 2024 at 12:51):

E.g., imagine that at some point you want to see which lemmas could be proved by by simp. You run your linter once a week and tally the results.

Damiano Testa (Aug 13 2024 at 12:53):

Another situation that I have encountered is when you want to know if using a terminal refine is quicker than an exact: you run the linter and wait for the results!

Damiano Testa (Aug 13 2024 at 15:58):

Can I just get confirmation that linters cannot modify EnvExtensions, whether they are persistent or not? I have been failing to do it, but I do not know if it is just that I am misunderstanding how to do that! :smile:

Mario Carneiro (Aug 13 2024 at 20:14):

they cannot do it, because EnvExtensions store data in the Environment, and linters are run inside a withoutModifyingEnv block

Mario Carneiro (Aug 13 2024 at 20:15):

actually I take it back, looking at the code it seems like there is nothing stopping them from modifying the environment

Damiano Testa (Aug 13 2024 at 21:38):

Mario, if you could show me some code where a linter leaves a trace of it linting something, I'd be very grateful!

Damiano Testa (Aug 13 2024 at 21:39):

All I really would like is just to know if a linter already linted something in a file or not, but anything would be welcome!

Damiano Testa (Aug 13 2024 at 21:40):

Also, where do you see what the linters do? I have been able to figure something out by seeing the "in action", but I have not found the code that is used to run them.

Mario Carneiro (Aug 13 2024 at 21:51):

search for runLinters in core

Damiano Testa (Aug 13 2024 at 21:52):

Oh, in Lean.Elab.Command?

Damiano Testa (Aug 13 2024 at 21:54):

If I am reading this correctly, it saves the state before running each linter and then sets it back but returns the messages!

Damiano Testa (Aug 13 2024 at 21:54):

So, I could embed persistent information in the messages!

Mario Carneiro (Aug 13 2024 at 21:55):

none of that seems to be rolling back the environment though, AFAIK

Damiano Testa (Aug 13 2024 at 21:56):

Doesn't the State contain the Environment? That seems to be saved and returned by the linters.

Mario Carneiro (Aug 13 2024 at 21:56):

oh, yes it's the let savedState ← get ... modify fun s => { savedState with messages := s.messages } part

Damiano Testa (Aug 13 2024 at 21:57):

Right, the only thing that seems to filter through are the messages.

Mario Carneiro (Aug 13 2024 at 21:58):

I don't think you get access to those messages later though, they are collected in the server

Damiano Testa (Aug 13 2024 at 21:59):

I guess that you are right: you can detect them outside of the linter, but not pass them on "silently" to the linting of the next command.

Damiano Testa (Aug 13 2024 at 22:06):

Maybe linters can store some information in the infotrees? Anyway, time for bed here!

Thanks a lot, though! This has been very fruitful!

Damiano Testa (Aug 14 2024 at 05:34):

In case anyone here is interested, I was able to get an IO.Ref to work for a new linter: #15794.

Sebastian Ullrich (Aug 14 2024 at 08:11):

Just to mention one more caveat without looking too far into your code: did you consider the thread safety of such a linter? The language server is an asynchronous program.

Damiano Testa (Aug 14 2024 at 08:19):

I would be happy to consider thread safety, but I do not know what it is... :upside_down:

Damiano Testa (Aug 14 2024 at 08:20):

Is it a valid/useful data point that the linter successfully identified all of mathlib?

Henrik Böving (Aug 14 2024 at 08:23):

Damiano Testa said:

I would be happy to consider thread safety, but I do not know what it is... :upside_down:

Thread safety is relevant once your function is not the only thing that may access some piece of data (in this case your IO.Ref) at any given time. This can happen when someone is using your linter while editing code with the language server. The fun thing about thread safety is that violations of it do not necessarily have to show up for everyone always when using the feature. Due to the way that concurrency etc. works it can be a sporadic, hard to observe but confusing phenomenon.

Damiano Testa (Aug 14 2024 at 08:29):

Oh, I see! Thanks for the explanation!

Damiano Testa (Aug 14 2024 at 08:29):

I have certainly observed thread unsafety when I was using IO.Refs, although I have not played enough with this linter to know if this one is or not thread safe. I did not observe "funny behaviour", but also did not use it extensively.

Damiano Testa (Aug 14 2024 at 08:30):

Just to be clear: in the aseptic setting of CI running lake build, thread safety is completely irrelevant, right?

Henrik Böving (Aug 14 2024 at 08:31):

At the moment yes, lake runs one process per file and that process is currently still single threaded (though that is partially going to change when we have in file parallelism of course)