Zulip Chat Archive

Stream: metaprogramming / tactics

Topic: False negatives in PtrSet.contains


Sven Manthe (Jan 08 2025 at 12:02):

In a (by now obsolete) application @Floris van Doorn and me tried to use a modified version of Lean.Expr.getUsedConstants. Internally, while traversing the expression, this avoids revisiting the same subterm by maintaining a PtrSet of already visited subterms. However, in some situations (maybe if it is called on an expression before some postprocessing step is executed?), the caching mechanism fails and the subterms are revisited.
This seems to lead to an exponential blowup in running time, which is also realized in practice. Replacing PtrSet by Std.HashSet in the implementation fixed this issue. It is not clear to me whether the issue is a bug in PtrSet, or that getUsedConstants should not use PtrSet, or that getUsedConstants should not be used for some expressions (which ideally would be documented...).
Below is code containing modifications of foldConsts that reproduce theses issues. Note that foldConsts internally uses a second caching mechanism on the leaves that always works fine. Since this second mechanism only makes the issue harder to diagnose by avoiding duplicates in the output, but does not solve the performance issue, it is removed completely below.

import Lean.Util.FoldConsts
import Lean.Elab

open Lean Meta Elab Tactic Term Expr.FoldConstsImpl

--copy getUsedConstants, but modify the definition to use Std.HashSet instead PtrSet and remove caching for leaves
unsafe structure State' where
 visited       : Std.HashSet Expr := {}
unsafe abbrev FoldM' := StateM State'
unsafe def foldCopy {α : Type} (f : Name  α  α) (e : Expr) (acc : α) : FoldM' α :=
  let rec visit (e : Expr) (acc : α) : FoldM' α := do
    if ( get).visited.contains e then
      return acc
    modify fun s => { s with visited := s.visited.insert e }
    match e with
    | .forallE _ d b _   => visit b ( visit d acc)
    | .lam _ d b _       => visit b ( visit d acc)
    | .mdata _ b         => visit b acc
    | .letE _ t v b _    => visit b ( visit v ( visit t acc))
    | .app f a           => visit a ( visit f acc)
    | .proj _ _ b        => visit b acc
    | .const c _         =>
      return f c acc
    | _ => return acc
  visit e acc
@[inline] unsafe def foldUnsafeCopy {α : Type} (e : Expr) (init : α) (f : Name  α  α) : α :=
  (foldCopy f e init).run' {}
@[implemented_by foldUnsafeCopy]
opaque Lean.Expr.foldConstsCopy {α : Type} (e : Expr) (init : α) (f : Name  α  α) : α := init
def getUsedConstantsCopy (e : Expr) : Array Name :=
  e.foldConstsCopy #[] fun c cs => cs.push c

--copy getUsedConstants, but modify the definition to remove caching for leaves
unsafe def foldCopy2 {α : Type} (f : Name  α  α) (e : Expr) (acc : α) : FoldM α :=
  let rec visit (e : Expr) (acc : α) : FoldM α := do
    if ( get).visited.contains e then
      return acc
    --this yields output below, showing the failure of PtrSet.contains
    /-let hs : Std.HashSet (Ptr Expr) := (← get).visited
    for x in hs do
      if x.value == e then dbg_trace "PtrSet failure {e}"-/
    modify fun s => { s with visited := s.visited.insert e }
    match e with
    | .forallE _ d b _   => visit b ( visit d acc)
    | .lam _ d b _       => visit b ( visit d acc)
    | .mdata _ b         => visit b acc
    | .letE _ t v b _    => visit b ( visit v ( visit t acc))
    | .app f a           => visit a ( visit f acc)
    | .proj _ _ b        => visit b acc
    | .const c _         =>
      return f c acc
    | _ => return acc
  visit e acc
@[inline] unsafe def foldUnsafeCopy2 {α : Type} (e : Expr) (init : α) (f : Name  α  α) : α :=
  (foldCopy2 f e init).run' {}
@[implemented_by foldUnsafeCopy2]
opaque Lean.Expr.foldConstsCopy2 {α : Type} (e : Expr) (init : α) (f : Name  α  α) : α := init
def getUsedConstantsCopy2 (e : Expr) : Array Name :=
  e.foldConstsCopy2 #[] fun c cs => cs.push c

elab "printConsts" tacs:ppDedent(tacticSeq) : tactic => do
  let target  getMainTarget
  let goal  getMainGoal
  let newGoal  mkFreshExprMVar target
  setGoals [newGoal.mvarId!]
  Elab.Tactic.evalTactic tacs
  setGoals [goal]
  let prf  instantiateMVars newGoal
  logInfo m! "{getUsedConstantsCopy2 prf}" --duplicates, thus performance issue
  logInfo m! "{getUsedConstantsCopy prf}" --no duplicates as caching works
  logInfo m! "{Lean.Expr.getUsedConstants prf}" --no duplicates due to second cache, but performance issue persists
example : Prop := by
  printConsts exact 0 + 0 = 0 + 0 --duplicates
  exact 0 + 0 = 0 + 0
example : 0 + 0 = 0 + 0 := by
  printConsts rfl --no duplicates
  rfl

Sebastian Ullrich (Jan 08 2025 at 12:16):

A HashSet cache introduces sharing, a PtrSet merely preserves sharing. Which of these is more appropriate completely depends on your specific use case and data. Note that after ShareCommon.shareCommon', PtrSet transformations should be as effective (and still more efficient) as HashSets

Sven Manthe (Jan 10 2025 at 10:13):

I see. Of course, I don't know much about the design decisions here, but it may be helpful if "general purpose functions" like getUsedConstants that become very slow by not introducing sharing in some cases (like on the proof terms I generated in practice) would document in some way that this happens and can be avoided by using ShareCommon before

Sebastian Ullrich (Jan 10 2025 at 10:47):

Yes, this could be considered a docbug


Last updated: May 02 2025 at 03:31 UTC