Zulip Chat Archive

Stream: lean4 dev

Topic: go to definition jumps to incorrect file


Xiyu Zhai (Mar 07 2024 at 18:16):

I'm a newbie trying to help by starting from bug fixes.
I'm looking at a 2yr old issue https://github.com/leanprover/lean4/issues/1170.
The issue is
"If there are two different files that contain declarations with the same name, the go to definition feature will sometimes jump to the declaration in the wrong file. This seems to happen consistently if the offending files are all open in a particular editor group (in tabs)."
On my computer, I can easily replicate this bug.
I'm a newbie so I'm just trying to understand the situation.
It seems to me that it could be caused by Environment sharing.
The following code finds location links from decl given uri.

open Elab in
def locationLinksFromDecl (srcSearchPath : SearchPath) (uri : DocumentUri) (n : Name)
    (originRange? : Option Range) : MetaM (Array LocationLink) := do
  let mod?  findModuleOf? n
  let modUri?  match mod? with
    | some modName => documentUriFromModule srcSearchPath modName
    | none         => pure <| some uri

  let ranges?  findDeclarationRanges? n
  if let (some ranges, some modUri) := (ranges?, modUri?) then
    let ll : LocationLink := {
      originSelectionRange? := originRange?
      targetUri := modUri
      targetRange := ranges.range.toLspRange
      targetSelectionRange := ranges.selectionRange.toLspRange
    }
    return #[ll]
  return #[]

function findModuleOf? is invoked, defined as

def findModuleOf? [Monad m] [MonadEnv m] [MonadError m] (declName : Name) : m (Option Name) := do
  discard <| getConstInfo declName -- ensure declaration exists
  match ( getEnv).getModuleIdxFor? declName with
  | none        => return none
  | some modIdx => return some (( getEnv).allImportedModuleNames[modIdx.toNat]!)

, which calls getModuleIdxFor defined as

def getModuleIdxFor? (env : Environment) (declName : Name) : Option ModuleIdx :=
  env.const2ModIdx.find? declName

which uses env.const2ModIdx.

env.const2ModIdx is of type HashMap Name ModuleIdx, which map each name to a module.

Now it seems to me that for different files, we are using the same environment, and thus unavoidably there could be name clashing.

My experiments show that if only one file is open, there is no such issue. Partially convinces me this is the issue.

I guess there should be a separate Environment for each file?

I'm just trying to see if my understanding of the origin of the bug is correct. Fixing the bug could be an overhaul though.

Xiyu Zhai (Mar 07 2024 at 18:23):

Technically I only started looking at the source code seriously today, so I could be totally wrong

Xiyu Zhai (Mar 07 2024 at 18:24):

I guess this is not a problem occurring in compilation because in compilation there is no sharing of Environment?

Kyle Miller (Mar 07 2024 at 18:26):

(This is my first time looking at this LSP server code.) There's not necessarily anything wrong with locationLinksFromDecl itself, since part of MetaM is the environment state. So long as the function is called with the Environment set correctly, it's looks like it should work fine.

Xiyu Zhai (Mar 07 2024 at 18:29):

I'm not suggesting locationLinksFromDecl is wrong though. I'm suggesting that Environment's field const2ModIdx is not detailed enough. Maybe HashMap (ModuleIdx, Name) ModuleIdx?

Kyle Miller (Mar 07 2024 at 18:29):

Not only is there a separate Environment per file, but there are multiple checkpoints of the environment, from between every command, all saved in various Lean.Elab.Command.State objects.

Xiyu Zhai (Mar 07 2024 at 18:29):

I see.

Xiyu Zhai (Mar 07 2024 at 18:29):

Thanks a lot

Xiyu Zhai (Mar 07 2024 at 18:30):

Then the bug should be somewhere else

Kyle Miller (Mar 07 2024 at 18:33):

To help you with the search, in a function like locationLinksOfInfo, you can see code like

  match i with
  | .ofTermInfo ti =>
    let mut expr := ti.expr
    ...
    match expr with
    | Expr.const n .. => return  ci.runMetaM i.lctx <| locationLinksFromDecl i n
    ...

The runMetaM function is what is setting up the Environment, among other things, from the data in ci.

Xiyu Zhai (Mar 07 2024 at 18:33):

Wow, thank you so much! I guess I need to figure out a way to print debug the lean server. I'm a newbie, lol.

Kyle Miller (Mar 07 2024 at 18:49):

Maybe it's worth logging information about infoWithCtx in Lean.Server.FileWorker.handleDefinition. For example, you could use the map₂ field of constants in the context's Environment to deduce which environment you're looking at. The map₂ field is for locally defined constants, rather than imported ones. There are probably better ways to debug this, but that's something that came to mind.

Xiyu Zhai (Mar 07 2024 at 18:49):

Thanks a lot! That’s extremely helpful!

Xiyu Zhai (Mar 08 2024 at 07:47):

Seems that I've found out the culprit. The server uses a global variable references to handle the request, thus causing the trouble.
handleDefinition is not the function actually being called.
In Watchdog.lean the function handleRequest is called, then in the wierd if statement before the match statement

if method == "textDocument/definition" || method == "textDocument/declaration" then
      let params  parseParams TextDocumentPositionParams params
      let definitions  findDefinitions params
      dbg_trace s!"handleRequest method = {method} params = {params} defns = {ToJson.toJson definitions}"
      if !definitions.isEmpty then
        ( read).hOut.writeLspResponse id, definitions
        return
    match method with

Just look at the invokation of findDefinitions, which goes to

def findDefinitions (p : TextDocumentPositionParams) : ServerM <| Array Location := do
  let mut definitions := #[]
  if let some path := fileUriToPath? p.textDocument.uri then
    let srcSearchPath := ( read).srcSearchPath
    if let some module  searchModuleNameOfFileName path srcSearchPath then
      let references  ( read).references.get
      for ident in references.findAt module p.position (includeStop := true) do
        if let some definitionLocation, _  references.definitionOf? ident srcSearchPath then
          definitions := definitions.push definitionLocation
  return definitions

Notice that

      let references  ( read).references.get
      for ident in references.findAt module p.position (includeStop := true) do
        if let some definitionLocation, _  references.definitionOf? ident srcSearchPath then
          definitions := definitions.push definitionLocation

, the references are got from server context, the type of which is defined by

structure ServerContext where
    hIn            : FS.Stream
    hOut           : FS.Stream
    hLog           : FS.Stream
    /-- Command line arguments. -/
    args           : List String
    fileWorkersRef : IO.Ref FileWorkerMap
    /-- We store these to pass them to workers. -/
    initParams     : InitializeParams
    workerPath     : System.FilePath
    srcSearchPath  : System.SearchPath
    references     : IO.Ref References

So references is a global variable, shared by all files (I believe with high confidence), then definitions are collected from this references global variable. The type of this global variable is

structure References where
  /-- References loaded from ilean files -/
  ileans : HashMap Name (System.FilePath × Lsp.ModuleRefs)
  /-- References from workers, overriding the corresponding ilean files -/
  workers : HashMap Name (Nat × Lsp.ModuleRefs)

See the fields ileans, a name is mapped to a file path and a collection of references.
Hence, if two different set of compilations with clashing names exist for the same time, it causes problems.

Marc Huisinga (Mar 08 2024 at 08:07):

Good diagnosis! Yes, that's one of the known issues with the References data structure. I was going to fix this until the next rc1 release :-)

Xiyu Zhai (Mar 08 2024 at 08:15):

Lol. You know all along. But I did learn a lot.

Notification Bot (Mar 08 2024 at 08:17):

Xiyu Zhai has marked this topic as resolved.

Marc Huisinga (Mar 08 2024 at 08:20):

@Sebastian Ullrich mentioned it to me recently. I hadn't diagnosed the issue myself yet!

By the way, the original reason for why we prefer to use .ileans for go-to-definition was that we want go-to-definition to jump to the current location if the definition is from a file that the user edited, but hasn't saved or compiled yet (i.e. the current data isn't available via .oleans). Since we need to synchronize up-to-date .ilean data from the file workers to the watchdog anyways for other "global" requests, we also re-use them for go-to-definition if we have them available.

Notification Bot (Mar 08 2024 at 08:21):

Xiyu Zhai has marked this topic as unresolved.

Xiyu Zhai (Mar 08 2024 at 08:22):

I understand the problem as user might use the same server to analyze two totally separate compilations, where there could be name clashes. There should be separate data for each "package".

Xiyu Zhai (Mar 08 2024 at 08:23):

In my own programming lauguage, basically everything is localized to a certain package, crate or module. I believe this is the same for rust-analyzer.

Xiyu Zhai (Mar 08 2024 at 08:25):

Btw, I'm curious about any plan of introducing fine-grained incremental computation system into the compilation system. In a fine-grained incremental computation compilation, change in each function affects others in a minimal sense. But this could be very complicated to implement.

Marc Huisinga (Mar 08 2024 at 08:29):

This is on our roadmap and @Sebastian Ullrich continues to work on this topic. The first PR of several is at lean4#3014 :-)


Last updated: May 02 2025 at 03:31 UTC