Zulip Chat Archive

Stream: general

Topic: Minimizer script


Anne Baanen (Oct 11 2023 at 15:07):

I wrote a semi-automatic minimization script for creating a #mwe from a test case. This script is inspired by coq-bug-minimizer.

Basic usage is to create a file, say testcase.lean, that exhibits some error, for example a timeout. Run ./scripts/minimizer testcase.lean and it will automatically create and update a more minimal testcase.min.1.lean which exhibits the same error¹. The file is updated as more minimization passes succeed. When no more minimization applies and the script stops, or you are tired of waiting and you kill the script, you can edit testcase.min.1.lean manually, and restart the script on your edited output by running ./scripts/minimizer testcase.min.1.lean. The script produces testcase.min.2.lean etc.

The architecture applies a sequence of minimization passes repeatedly, stopping when no more pass has made progress. The passes can do basically anything with the source, and make progress when the new file is different but exhibits the same error. I've implemented a few naïve textual replacements like substituting everything that comes after a := with sorry. Since only a subset of those changes will actually work, the script includes a framework that bisects the set of changes to find exactly what can be done. That also means very brute-force changes like deleting declarations can be done in a pass.

Current transformations are:

  • stripping out comments
  • deleting #align and #align_import`
  • deleting declarations
  • replacing everything after := with sorry
  • deleting imports
  • replacing imports with the contents of those files

¹Currently, the script checks that the Lean compiler's stdout is the same on the minimized file and the original, ignoring filenames, line numbers and sorry.

Anne Baanen (Oct 11 2023 at 15:13):

Things are still very naïve and unoptimized, but it should be enough to get your code going.

Scott Morrison (Oct 11 2023 at 17:09):

This is amazing. A potential very useful transformation is replacing a structure field definition with sorry.

Scott Morrison (Oct 11 2023 at 17:12):

An unfortunately non-local transformation that is often essential for good minimisation is to remove a structure field entirely (which then requires removing it from all constructors, too). I guess this can be converted into a sequence of local transformations by allowing adding a default value of sorry?

Julian Berman (Oct 11 2023 at 17:31):

In case it's interesting, or even better perhaps useful, https://github.com/DRMacIver/structureshrink is a cool thing. Not sure if what the minimization script is doing can be coerced to run within it, but it implements a concrete shrinking algorithm.

Anne Baanen (Oct 11 2023 at 17:52):

Scott Morrison said:

An unfortunately non-local transformation that is often essential for good minimisation is to remove a structure field entirely (which then requires removing it from all constructors, too). I guess this can be converted into a sequence of local transformations by allowing adding a default value of sorry?

That's an interesting idea! But I suspect that the local transformation wouldn't work in case someone calls the constructor directly or uses anonymous structure notation. On the other hand, there's nothing in the setup that prohibits calling Lean during a pass, so we could try to use that as feedback for how to fix a broken build.

Anne Baanen (Oct 11 2023 at 17:57):

Julian Berman said:

In case it's interesting, or even better perhaps useful, https://github.com/DRMacIver/structureshrink is a cool thing. Not sure if what the minimization script is doing can be coerced to run within it, but it implements a concrete shrinking algorithm.

That's an interesting algorithm! I don't think we want to interface with structureshrink directly, since the notion of minimal is quite different (in particular, we want our file to grow when we inline imports).

Scott Morrison (Oct 12 2023 at 03:30):

Something I've found essential in minimization previously is to be able to run the sources against two versions of Lean, preserving that it works in one, and errors with a specified error message in the other.

Anne Baanen (Oct 12 2023 at 08:34):

I have left the script run on the field structure for DirectLimit for a night. Its progress is... not so impressive:

import Mathlib.Algebra.DirectSum.Basic
import Mathlib.LinearAlgebra.Basis
import Mathlib.LinearAlgebra.DFinsupp
import Mathlib.RingTheory.FreeCommRing
import Mathlib.RingTheory.Ideal.Quotient

universe u v w u₁

namespace DirectSum

open DirectSum

section General

variable {R : Type u} [Semiring R]

variable {ι : Type v} [dec_ι : DecidableEq ι]

variable {M : ι  Type w} [ i, AddCommMonoid (M i)] [ i, Module R (M i)]

instance : Module R ( i, M i) :=
  DFinsupp.module

instance {S : Type*} [Semiring S] [ i, Module S (M i)] [ i, SMulCommClass R S (M i)] :
    SMulCommClass R S ( i, M i) := sorry

instance {S : Type*} [Semiring S] [SMul R S] [ i, Module S (M i)] [ i, IsScalarTower R S (M i)] :
    IsScalarTower R S ( i, M i) := sorry

instance [ i, Module Rᵐᵒᵖ (M i)] [ i, IsCentralScalar R (M i)] : IsCentralScalar R ( i, M i) := sorry

theorem smul_apply (b : R) (v :  i, M i) (i : ι) : (b  v) i = b  v i := sorry

variable (R ι M)

def lmk :  s : Finset ι, ( i : (s : Set ι), M i.val) →ₗ[R]  i, M i := sorry

def lof :  i : ι, M i →ₗ[R]  i, M i :=
  DFinsupp.lsingle

theorem lof_eq_of (i : ι) (b : M i) : lof R ι M i b = of M i b := sorry

variable {ι M}

theorem single_eq_lof (i : ι) (b : M i) : DFinsupp.single i b = lof R ι M i b := sorry

theorem mk_smul (s : Finset ι) (c : R) (x) : mk M s (c  x) = c  mk M s x := sorry

theorem of_smul (i : ι) (c : R) (x) : of M i (c  x) = c  of M i x := sorry

variable {R}

theorem support_smul [ (i : ι) (x : M i), Decidable (x  0)] (c : R) (v :  i, M i) :
    (c  v).support  v.support := sorry

variable {N : Type u₁} [AddCommMonoid N] [Module R N]

variable (φ :  i, M i →ₗ[R] N)

variable (R ι N)

def toModule : ( i, M i) →ₗ[R] N :=
  FunLike.coe (DFinsupp.lsum ) φ

theorem coe_toModule_eq_coe_toAddMonoid :
    (toModule R ι N φ : ( i, M i)  N) = toAddMonoid fun i  (φ i).toAddMonoidHom := sorry

variable {ι N φ}

@[simp]
theorem toModule_lof (i) (x : M i) : toModule R ι N φ (lof R ι M i x) = φ i x := sorry

variable (ψ : ( i, M i) →ₗ[R] N)

theorem toModule.unique (f :  i, M i) : ψ f = toModule R ι N (fun i  ψ.comp <| lof R ι M i) f := sorry

variable {ψ} {ψ' : ( i, M i) →ₗ[R] N}

@[ext]
theorem linearMap_ext ψ ψ' : ( i, M i) →ₗ[R] N
    (H :  i, ψ.comp (lof R ι M i) = ψ'.comp (lof R ι M i)) : ψ = ψ' := sorry

def lsetToSet (S T : Set ι) (H : S  T) : ( i : S, M i) →ₗ[R]  i : T, M i := sorry

variable (ι M)

@[simps apply]
def linearEquivFunOnFintype [Fintype ι] : ( i, M i) ≃ₗ[R]  i, M i :=
  { DFinsupp.equivFunOnFintype with
    toFun := ()
    map_add' := fun f g  by
      ext
      rw [add_apply, Pi.add_apply]
    map_smul' := fun c f  by
      simp_rw [RingHom.id_apply]
      rw [DFinsupp.coe_smul] }

variable {ι M}

@[simp]
theorem linearEquivFunOnFintype_lof [Fintype ι] [DecidableEq ι] (i : ι) (m : M i) :
    (linearEquivFunOnFintype R ι M) (lof R ι M i m) = Pi.single i m := sorry

@[simp]
theorem linearEquivFunOnFintype_symm_single [Fintype ι] [DecidableEq ι] (i : ι) (m : M i) :
    (linearEquivFunOnFintype R ι M).symm (Pi.single i m) = lof R ι M i m := sorry

@[simp]
theorem linearEquivFunOnFintype_symm_coe [Fintype ι] (f :  i, M i) :
    (linearEquivFunOnFintype R ι M).symm f = f := sorry

protected def lid (M : Type v) (ι : Type* := PUnit) [AddCommMonoid M] [Module R M] [Unique ι] :
    ( _ : ι, M) ≃ₗ[R] M :=
  { DirectSum.id M ι, toModule R ι M fun _  LinearMap.id with }

variable (ι M)

def component (i : ι) : ( i, M i) →ₗ[R] M i := sorry

variable {ι M}

@[ext]
theorem ext {f g :  i, M i} (h :  i, component R ι M i f = component R ι M i g) : f = g := sorry

theorem ext_iff {f g :  i, M i} : f = g   i, component R ι M i f = component R ι M i g := sorry

@[simp]
theorem lof_apply (i : ι) (b : M i) : ((lof R ι M i) b) i = b := sorry

theorem component.of (i j : ι) (b : M j) :
    component R ι M i ((lof R ι M j) b) = if h : j = i then Eq.recOn h b else 0 := sorry

section CongrLeft

variable {κ : Type*}

def lequivCongrLeft (h : ι  κ) : ( i, M i) ≃ₗ[R]  k, M (h.symm k) := sorry

@[simp]
theorem lequivCongrLeft_apply (h : ι  κ) (f :  i, M i) (k : κ) :
    lequivCongrLeft R h f k = f (h.symm k) := sorry

end CongrLeft

section Sigma

variable {α : ι  Type*} {δ :  i, α i  Type w}

variable [ i j, AddCommMonoid (δ i j)] [ i j, Module R (δ i j)]

def sigmaLcurry : ( i : Σi, _, δ i.1 i.2) →ₗ[R]  (i) (j), δ i j := sorry

noncomputable def sigmaLuncurry [ i, DecidableEq (α i)] [ i j, DecidableEq (δ i j)] :
    ( (i) (j), δ i j) →ₗ[R]  i : Σ_, _, δ i.1 i.2 := sorry

end Sigma

section Option

variable {α : Option ι  Type w} [ i, AddCommMonoid (α i)] [ i, Module R (α i)]

@[simps]
noncomputable def lequivProdDirectSum : ( i, α i) ≃ₗ[R] α none ×  i, α (some i) :=
  { addEquivProdDirectSum with map_smul' := DFinsupp.equivProdDFinsupp_smul }

end Option

end General

section Submodule

section Semiring

variable {R : Type u} [Semiring R]

variable {ι : Type v} [dec_ι : DecidableEq ι]

variable {M : Type*} [AddCommMonoid M] [Module R M]

variable (A : ι  Submodule R M)

def coeLinearMap : ( i, A i) →ₗ[R] M :=
  toModule R ι M fun i  (A i).subtype

@[simp]
theorem coeLinearMap_of (i : ι) (x : A i) : DirectSum.coeLinearMap A (of (fun i  A i) i x) = x :=
  sorry

variable {A}

theorem IsInternal.submodule_iSup_eq_top (h : IsInternal A) : iSup A =  := sorry

theorem IsInternal.submodule_independent (h : IsInternal A) : CompleteLattice.Independent A := sorry

noncomputable def IsInternal.collectedBasis (h : IsInternal A) {α : ι  Type*}
    (v :  i, Basis (α i) R (A i)) : Basis (Σi, α i) R M where
  repr :=
    ((LinearEquiv.ofBijective (DirectSum.coeLinearMap A) h).symm ≪≫ₗ
        DFinsupp.mapRange.linearEquiv fun i  (v i).repr) ≪≫ₗ
      (sigmaFinsuppLequivDFinsupp R).symm

theorem IsInternal.collectedBasis_mem (h : IsInternal A) {α : ι  Type*}
    (v :  i, Basis (α i) R (A i)) (a : Σi, α i) : h.collectedBasis v a  A a.1 := sorry

theorem IsInternal.isCompl {A : ι  Submodule R M} {i j : ι} (hij : i  j)
    (h : (Set.univ : Set ι) = {i, j}) (hi : IsInternal A) : IsCompl (A i) (A j) := sorry

end Semiring

section Ring

variable {R : Type u} [Ring R]

variable {ι : Type v} [dec_ι : DecidableEq ι]

variable {M : Type*} [AddCommGroup M] [Module R M]

theorem isInternal_submodule_of_independent_of_iSup_eq_top {A : ι  Submodule R M}
    (hi : CompleteLattice.Independent A) (hs : iSup A = ) : IsInternal A := sorry

theorem isInternal_submodule_iff_independent_and_iSup_eq_top (A : ι  Submodule R M) :
    IsInternal A  CompleteLattice.Independent A  iSup A =  := sorry

theorem IsInternal.addSubgroup_independent {M : Type*} [AddCommGroup M] {A : ι  AddSubgroup M}
    (h : IsInternal A) : CompleteLattice.Independent A :=
  CompleteLattice.independent_of_dfinsupp_sumAddHom_injective' _ h.injective

end Ring

end Submodule

end DirectSum

open Submodule

variable {R : Type u} [Ring R]

variable {ι : Type v}

variable [dec_ι : DecidableEq ι] [Preorder ι]

variable (G : ι  Type w)

class DirectedSystem (f :  i j, i  j  G i  G j) : Prop where
  map_self' :  i x h, f i i h x = x
  map_map' :  {i j k} (hij hjk x), f j k hjk (f i j hij x) = f i k (le_trans hij hjk) x

section

variable [ i, AddCommGroup (G i)] [ i, Module R (G i)]

variable {G} (f :  i j, i  j  G i →ₗ[R] G j)

nonrec theorem DirectedSystem.map_map [DirectedSystem G fun i j h => f i j h] {i j k} (hij hjk x) :
    f j k hjk (f i j hij x) = f i k (le_trans hij hjk) x := sorry

variable (G)

def DirectLimit : Type max v w :=
  DirectSum ι G 
    (span R <|
      { a |
         (i j : _) (H : i  j) (x : _),
          DirectSum.lof R ι G i x - DirectSum.lof R ι G j (f i j H x) = a })

namespace DirectLimit

instance addCommGroup : AddCommGroup (DirectLimit G f) :=
  Quotient.addCommGroup _

instance module : Module R (DirectLimit G f) :=
  Quotient.module _

variable (R ι)

def of (i) : G i →ₗ[R] DirectLimit G f :=
  (mkQ _).comp <| DirectSum.lof R ι G i

end DirectLimit

namespace AddCommGroup

variable [ i, AddCommGroup (G i)]

def DirectLimit (f :  i j, i  j  G i →+ G j) : Type _ := sorry

variable (f :  i j, i  j  G i →+ G j)

variable (P : Type u₁) [AddCommGroup P]

variable (g :  i, G i →+ P)

variable (Hg :  i j hij x, g j (f i j hij x) = g i x)

namespace Ring

variable [ i, CommRing (G i)]

variable (f :  i j, i  j  G i  G j)

open FreeCommRing

def DirectLimit : Type max v w :=
  FreeCommRing (Σi, G i) 
    Ideal.span
      { a |
        ( i j H x, of (⟨j, f i j H x : Σi, G i) - of i, x = a) 
          ( i, of (⟨i, 1 : Σi, G i) - 1 = a) 
            ( i x y, of (⟨i, x + y : Σi, G i) - (of i, x + of i, y⟩) = a) 
               i x y, of (⟨i, x * y : Σi, G i) - of i, x * of i, y = a }

namespace DirectLimit

instance commRing : CommRing (DirectLimit G f) :=
  Ideal.Quotient.commRing _

end DirectLimit

namespace Field

variable (f' :  i j, i  j  G i →+* G j)

namespace DirectLimit

instance nontrivial [DirectedSystem G fun i j h => f' i j h] :
    Nontrivial (Ring.DirectLimit G fun i j h => f' i j h) := sorry

theorem exists_inv {p : Ring.DirectLimit G f} : p  0   y, p * y = 1 := sorry

open Classical

noncomputable def inv (p : Ring.DirectLimit G f) : Ring.DirectLimit G f :=
  if H : p = 0 then 0 else Classical.choose (DirectLimit.exists_inv G f H)

protected theorem mul_inv_cancel {p : Ring.DirectLimit G f} (hp : p  0) : p * inv G f p = 1 := sorry

@[reducible]
protected noncomputable def field [DirectedSystem G fun i j h => f' i j h] :
    Field (Ring.DirectLimit G fun i j h => f' i j h) :=
  { Ring.DirectLimit.commRing G fun i j h => f' i j h,
    DirectLimit.nontrivial G fun i j h => f' i j h with
    inv := inv G fun i j h => f' i j h
    mul_inv_cancel := fun p => DirectLimit.mul_inv_cancel G fun i j h => f' i j h
    inv_zero := dif_pos rfl }

Mario Carneiro (Oct 12 2023 at 10:30):

is it still converging?

Mario Carneiro (Oct 12 2023 at 10:34):

One thing you could do to speed it up, assuming that your edits are quick and don't require lean information but the checking time is long, is to perform many steps in one go, and bisect if you overshoot. This is a good option for edits you don't think are likely to fail (like inlining) and less good for cases where most edits will fail (like deleting declarations, if the input is already mostly tree-shaken)

Anne Baanen (Oct 12 2023 at 12:19):

In fact, the modifications I made last night already do this :D

I think the real issue is that changes that have been rejected in previous passes are retried in the next pass, when we should really only focus on the parts of the file that changed.


Last updated: Dec 20 2023 at 11:08 UTC