Zulip Chat Archive
Stream: general
Topic: Canonical
Chase Norman (Jan 05 2025 at 23:10):
@Jared green We are planning to release Canonical this Spring!
Ruben Van de Velde (Jan 05 2025 at 23:24):
Spring in which hemisphere? :)
Jason Rute (Jan 06 2025 at 00:26):
@Chase Norman What is Canonical?
Chase Norman (Jan 07 2025 at 03:54):
@Jason Rute chasenorman.com It's an automated theorem prover for dependent type theory. If you recall, it was the big goal I was discussing at IPAM Machine Assisted Proofs.
Jason Rute (Jan 07 2025 at 12:44):
@Chase Norman will it eventually be a lean tactic (either native or through a hammer-like interface or through a plugin)? How does it compare to Duper in both scope and capabilities? What applications will it shine on? How does it handle many available premises or many applicable premises? Does it need a premise/lemma selector for real-world applications? It works by application, right? So does/will it use the Lean discrimination tree for the apply? tactic, or does it have it’s own data structures?
Jason Rute (Jan 07 2025 at 14:28):
Also @Victor Maia says his tool can be used for proof synthesis in Lean (or Kind, his theorem prover). So maybe it is a competitor to Duper or Canonical. https://x.com/victortaelin/status/1876579350382096745?s=46
Chase Norman (Jan 07 2025 at 22:10):
@Jason Rute Canonical will be a Lean tactic on Day 1, through a hammer-like interface. Canonical is a sound and complete proof search for dependent type theory, and generates terms--not tactic based proofs.
Canonical has no knowledge of theories like equality, natural numbers, or classical logic. It receives this information directly from Lean's internal representation. Canonical is made for higher-order reasoning where you intend to have a short proof term as a witness; think synthesizing motives for (generalized) induction or other axiom schema, working with user-written (indexed) inductive types, program synthesis by specification or examples, proof minimization, non-Prop types like isomorphisms and typeclasses, generating multiple witnesses of a given type, etc.
This is in contrast to Duper, which is strong at inhabiting Prop types for problems that translate into a first-order logic, and can handle large numbers of premises and well-understood theories like equality and LEM. Duper is intended to close goals where you don't really care what is generated. Canonical replaces itself with an exact proof term upon completion and does not stay in your code.
To facilitate millions of steps per second and a complete search procedure, Canonical has its own internal representation and outputs a complete term without interacting with Lean.
Asei Inoue (Jan 10 2025 at 00:30):
That sounds nice
Jared green (Feb 27 2025 at 21:08):
@Chase Norman can we get a more precise time frame for the availability of Canonical?
Asei Inoue (Mar 08 2025 at 06:29):
I have found suggest_premises tactic is added in Lean.
any related?
Chase Norman (Mar 08 2025 at 09:40):
@Jared green April
Jared green (Mar 10 2025 at 02:42):
@Chase Norman where your page says Canonical only generates canonical forms, how is that defined? how do you assure that?
Mario Carneiro (Mar 10 2025 at 23:40):
IIUC it normalizes things at parse time
Chase Norman (Mar 10 2025 at 23:45):
@Jared green @Mario Carneiro In general it does not require normalization at parse time. Canonical cannot place a lambda inside an application (by construction), cannot place a constructor in the major argument of a recursor/projection (hard-coded), and always places as many lambdas as is required to be eta-long (by construction, nontrivial). In this sense it only searches canonical forms. The internal representation of Canonical is BNEL, but can support beta unreduced forms using let definitions.
Chase Norman (Mar 10 2025 at 23:47):
You can choose which definitions Canonical can use, so canonicity with respect to delta reduction is at the user’s discretion.
Jared green (Mar 31 2025 at 21:24):
@Chase Norman can we get a more precise release time for Canonical?
Chase Norman (Apr 18 2025 at 06:30):
For those on this thread that have been waiting for quite some time, I want to inform you that Canonical is "released." The tactic, source code, arXiv paper, and demo videos are on my website https://chasenorman.com. I have spent much of the past month battling build and link issues, and these are still ongoing. I'm choosing not to make a big announcement on this Zulip and elsewhere yet, because:
- precompileModules is broken on the latest version of MacOS.
lake buildand VSCode builds expect dynlibs to be in different places.- I am struggling to get
buildArchiveto download from a github release, so I am not currently packaging dynlibs for Intel Macs or Linux on Arm. - The
dynlibsfeature intended to directly support the linking of dynamic libraries has not been released yet (coming withv4.19.0). - Users that attempt to reset their build by deleting the Canonical package directory will find that it is not compiled upon redownloading.
- There are miscellaneous build issues even for some users meeting the system requirements, such as
undefined symbol: lean_alloc_smallon Ubuntu,missing symbolsandld64.lld: error: library not found for -lcanonicalon MacOS. - Windows users require extra installation instructions to ensure that the
canonical.dllis linked.
In short, I want to make sure that users have an easy time installing Canonical, and Lean's dynlib functionality is still being developed.
Kim Morrison (Apr 20 2025 at 04:26):
I also encountered a build issue, which seems to be different from those above:
info: stderr:
libc++abi: terminating due to uncaught exception of type lean::exception: error loading plugin, initializer not found 'initialize_canonical'
error: Lean exited with code 134
For me, this reproduces by checking out branch#canonical of Mathlib and typing lake build.
Chase Norman (Apr 20 2025 at 04:41):
@Kim Morrison I think on 4.19 they are revamping the “plugin” system. Perhaps trying this on 4.18 will avoid this particular error.
Kim Morrison (Apr 20 2025 at 07:22):
Thanks, I've rebased branch#canonical on the v4.18.0 tag of Mathlib, and now get the error:
info: stderr:
libc++abi: terminating due to uncaught exception of type lean::exception: error loading library, dlopen(/Users/kim/projects/lean/mathlib4-4/.lake/packages/Canonical/.lake/build/lib/lean/Canonical.dylib, 0x0009): Library not loaded: @loader_path/libcanonical.dylib
Referenced from: <4C4C4429-5555-3144-A1EB-2D39C1A645EE> /Users/kim/projects/lean/mathlib4-4/.lake/packages/Canonical/.lake/build/lib/lean/Canonical.dylib (built for macOS 99.0 which is newer than running OS)
Reason: tried: '/Users/kim/projects/lean/mathlib4-4/.lake/packages/Canonical/.lake/build/lib/lean/libcanonical.dylib' (no such file)
Chase Norman (Apr 20 2025 at 07:41):
Is this on lake build or via VSCode?
Chase Norman (Apr 20 2025 at 07:44):
And I apologize for this. It really should be one-click; I hope it’s clear why I am waiting
Kim Morrison (Apr 20 2025 at 09:03):
Just lake build! Somehow in VSCode it seems to work!
Also, I get multiple
warning: Canonical: repository '././.lake/packages/Canonical' has local changes
warnings.
Kim Morrison (Apr 20 2025 at 09:04):
Chase Norman said:
And I apologize for this. It really should be one-click; I hope it’s clear why I am waiting
No worries -- I'm looking forward to getting to try it out, and I appreciate the effort to make things one-click for users.
Kim Morrison (Apr 20 2025 at 09:10):
I'm trying it out in VSCode a bit. Can I suggest that when you find a proof and show the info message, you also discharge the goal with sorry, so as to avoid a red squiggly (which nearly obscures the blue squiggly!)
Kim Morrison (Apr 20 2025 at 09:27):
Do you have other publicly available examples besides those from NNG? I'm having a bit of trouble finding interesting examples where it succeeds. So far my efforts have only got
example (x y : Nat × Nat) (h : x = y.swap) : y.swap = x := by canonical
example : { x : Nat × Nat // x.1^2 = x.2 + 1} := by canonical
example (x y : Nat) (h : x = y) : x^2 = y^2 := by canonical
(and even there, the proof terms for the 2nd and 3rd are unnecessarily large, but I guess that comes with the territory.)
I have a list of negative examples I could post if you're interested (here or elsewhere).
Chase Norman (Apr 20 2025 at 19:00):
Perhaps you would be interested in the Examples section of the Canonical for Automated Theorem Proving in Lean paper
Chase Norman (Apr 20 2025 at 19:06):
It can be difficult to tell whether there exists a short proof term for a statement or not, and I hope Canonical can be a useful tool to determine this. Since Canonical has no native support for natural numbers, the performance on the NNG is more indicative of reasoning about the foundations of any axiomatic system (indeed, you can replace all instances of Nat with a custom MyNat in the NNG and get the same results).
My next formalization project might be to formalize the properties of the type theory of Canonical. This will involve custom inductive types and foundational proofs by induction on them. I expect Canonical to do quite well there, just as in NNG.
Wang Jingting (Apr 21 2025 at 06:38):
I tried to run Canonical locally both in a new lake project and in the cloned version of that mathlib branch, but in both cases I met the error error loading library, libcanonical.so: cannot open shared object file: No such file or directory. (when using both VSCode and lake build) Is there some way I could deal with this?
The system information is as below (I don't really know what some of these entries mean, just copied them from system information, hopefully it's useful):
System:
Kernel: 6.8.0-51-generic arch: x86_64 bits: 64 compiler: gcc v: 13.3.0 clocksource: tsc
Desktop: Cinnamon v: 6.4.6 tk: GTK v: 3.24.41 wm: Muffin v: 6.4.1 vt: 7 dm: LightDM v: 1.30.0
Distro: Linux Mint 22.1 Xia base: Ubuntu 24.04 noble
Machine:
Type: Laptop System: LENOVO product: 21N5 v: ThinkBook 16p G5 IRX serial: <superuser required>
Chassis: type: 10 v: ThinkBook 16p G5 IRX serial: <superuser required>
Mobo: LENOVO model: LNVNB161216 v: SDK0T76479 WIN serial: <superuser required>
part-nu: LENOVO_MT_21N5_BU_idea_FM_ThinkBook 16p G5 IRX uuid: <superuser required> UEFI: LENOVO
v: P5CN26WW date: 09/03/2024
CPU:
Info: 24-core (8-mt/16-st) model: Intel Core i9-14900HX bits: 64 type: MST AMCP smt: enabled
arch: Raptor Lake rev: 1 cache: L1: 2.1 MiB L2: 32 MiB L3: 36 MiB
Speed (MHz): avg: 1981 high: 5600 min/max: 800/5600:5800:4100 cores: 1: 1230 2: 2974 3: 3010
4: 800 5: 5600 6: 800 7: 800 8: 800 9: 800 10: 800 11: 800 12: 800 13: 3628 14: 800 15: 4600
16: 800 17: 4002 18: 800 19: 800 20: 800 21: 1542 22: 3371 23: 3572 24: 3232 25: 800 26: 1837
27: 3088 28: 4102 29: 800 30: 800 31: 1745 32: 3065 bogomips: 154828
Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
Chase Norman (Apr 21 2025 at 06:42):
I’ve tested on something similar with Ubuntu x86. I assume your .lake/packages/Canonical/.lake/build/lib folder has libcanonical.so, so it’s a linker issue.
Wang Jingting (Apr 21 2025 at 06:48):
Yes. The file is actually under the path you described.
Wang Jingting (Apr 21 2025 at 06:50):
So is there currently a way to fix this? (Or should I wait until the error is fixed properly?) Thanks!
Chase Norman (Apr 21 2025 at 06:51):
I feel that this would need further investigation to resolve; there’s not a clear fix
Wang Jingting (Apr 21 2025 at 06:54):
OK, Thanks for the reply! Then I'll just wait for a while and see if this error can be fixed. But anyway, I'm still looking forward to having a try!
Niels Voss (Apr 21 2025 at 07:57):
Out of curiosity, did you get this error from lake build or VSCode? According to https://github.com/chasenorman/CanonicalLean/issues/1, Canonical currently only works when built and run from VSCode, due to an issue with loader_path. This is a different error than what you are getting, so it's probably not the issue, but it's worth checking.
Chase Norman (Apr 21 2025 at 12:45):
(deleted)
Chase Norman (Apr 21 2025 at 14:51):
He says both (I take it with the same error). And @loader_path is a MacOS thing. Regardless, Canonical is meant to be used primarily through the VSCode plugin, so I tested with that in mind.
Wang Jingting (Apr 22 2025 at 03:04):
Niels Voss said:
Out of curiosity, did you get this error from
lake buildor VSCode? According to https://github.com/chasenorman/CanonicalLean/issues/1, Canonical currently only works when built and run from VSCode, due to an issue with loader_path. This is a different error than what you are getting, so it's probably not the issue, but it's worth checking.
Yeah, actually I tried both of these methods, and unfortunately, none worked. But anyway, I guess I'll just wait until a fix of that problem has been found. By the way, should I also submit a detailed description in an issue to that repository? @Chase Norman (Since you said you've met the same error, I wonder if submitting an issue is still necessary/recommended)
Edit: I also met the undefined symbol: lean_alloc_small error.
Niels Voss (Apr 22 2025 at 06:43):
He says both (I take it with the same error). And @loader_path is a MacOS thing. Regardless, Canonical is meant to be used primarily through the VSCode plugin, so I tested with that in mind.
It makes sense that you would focus on VSCode, though it's clearly not ideal that lake build always fails in a repository with canonical installed.
Chase Norman (Apr 22 2025 at 07:01):
@Wang Jingting that lean_alloc_small error is much better than the earlier one. I specifically checked that the linux-x86_64 build of Lean has lean_alloc_small in the libleanshared.so. It seems like this depends on the specific installation of Linux, but I still find it hard to imagine that Lean wouldn’t be linking this very necessary primitive.
Chase Norman (Apr 22 2025 at 07:08):
You can verify my work by running nm -D --defined-only ~/.elan/toolchains/leanprover--lean4---v4.18.0/lib/lean/libleanshared.so | grep lean_alloc_small
Chase Norman (Apr 22 2025 at 07:13):
I guess I could add a linker argument to force this library to be linked.
Sebastian Ullrich (Apr 22 2025 at 07:23):
This symbol vanished in 4.19.0 so might there be some Lean version confusion?
Chase Norman (Apr 22 2025 at 07:23):
Oh, what should I use instead?
Sebastian Ullrich (Apr 22 2025 at 07:25):
lean_alloc_small_object is the one Lean itself calls
Sebastian Ullrich (Apr 22 2025 at 07:26):
But if code compiled against 4.18 is run against 4.19, there must be some more fundamental issue
Wang Jingting (Apr 22 2025 at 07:46):
Chase Norman said:
Wang Jingting that
lean_alloc_smallerror is much better than the earlier one. I specifically checked that thelinux-x86_64build of Lean haslean_alloc_smallin thelibleanshared.so. It seems like this depends on the specific installation of Linux, but I still find it hard to imagine that Lean wouldn’t be linking this very necessary primitive.
I remember that the first time the lean_alloc_small error didn't show up on my first attempt, but then I don't know why, but sometimes lake automatically update the lean-toolchain to v4.19.0-rc3 and this starts to appear. So maybe this isn't much to worry if it's only a version problem, I guess I can solve the version problem anyway.
Chase Norman (Apr 22 2025 at 07:47):
It is in fact a version problem
Wang Jingting (Apr 22 2025 at 07:48):
Thanks. I'll try to first fix this easier problem locally.
Edit: Hooray! I don't know exactly what is happening, but things start to work on my laptop now!
Notification Bot (Apr 22 2025 at 09:51):
49 messages were moved here from #general > Duper and Program Synthesis by Patrick Massot.
Wang Jingting (Apr 22 2025 at 13:49):
Kim Morrison said:
Do you have other publicly available examples besides those from NNG? I'm having a bit of trouble finding interesting examples where it succeeds. So far my efforts have only got
example (x y : Nat × Nat) (h : x = y.swap) : y.swap = x := by canonical example : { x : Nat × Nat // x.1^2 = x.2 + 1} := by canonical example (x y : Nat) (h : x = y) : x^2 = y^2 := by canonical(and even there, the proof terms for the 2nd and 3rd are unnecessarily large, but I guess that comes with the territory.)
I have a list of negative examples I could post if you're interested (here or elsewhere).
It seems that the second and third example doesn't work on my computer? The first one returned a result, but the second and third one both returned no proof found, even after I raised the time limit to a larger number.
Deming Xu (Apr 24 2025 at 06:08):
I changed my lean-toolchain file to leanprover/lean4:v4.18.0 and canonical started to work, but then import Mathlib failed because of the version change. I don't know how to downgrade Mathlib to v4.18.0. Whenever I run lake update mathlib, lake always changes the toolchain file back to v4.19.0-rc3 for me...
I think the idea of proving things only from first principles of the language is really cool!
The fact that canonical can find the Cantor diagonal proof on its own is also cool.
I ran these tests:
import Canonical
example : 1 ≠ 2 := by
-- canonical -- Timeout. Here is my proof:
intro h -- 1 = 2. Goal: False
let P (x : Nat) : Prop := match x with
| 1 => True
| _ => False
have h1 : P 1 := True.intro
rw [h] at h1 -- (h1 : P 2) after substitution. Now P 2 is False
exact h1
example : ∃ x : Nat, x * x = 0 := by
-- exact Exists.intro 0 (Eq.refl 0)
-- canonical 10 -- It finds this proof:
exact
Exists.intro Nat.zero
(Eq.rec (motive := fun a t ↦ a = Nat.zero) (Eq.refl Nat.zero)
(Nat.rec (motive := fun t ↦ t = t) (Eq.refl Nat.zero) (fun n n_ih ↦ Eq.refl n.succ)
(Nat.zero.mul Nat.zero)))
example : ∃ x : Nat, x * x = 1 := by
-- exact Exists.intro 1 (Eq.refl 1)
-- canonical 10 -- It finds this proof:
exact
Exists.intro Nat.zero.succ
(Eq.rec (motive := fun a t ↦ a) (Eq.refl Nat.zero.succ)
(Eq.rec (motive := fun a t ↦ (a = a) = (a.mul a = a))
(Eq.refl (Nat.zero.succ = Nat.zero.succ))
(Nat.rec (motive := fun t ↦ Nat.zero.succ = Nat.zero.succ) (Eq.refl Nat.zero.succ)
(fun n n_ih ↦ n_ih) (Nat.zero.mul Nat.zero))))
example : ∃ x : Nat, x * x = 4 := by
-- canonical 10 -- Timeout. Here is my proof:
exact Exists.intro 2 (Eq.refl 4)
canonical failed to prove 1 ≠ 2, but I think this is reansonable because the actual mechanism in Lean is pretty complicated.
It succeeded in proving ∃ x, x * x = 0 and ∃ x, x * x = 1, but the proof seems to have some redundant parts. Running Eq.rec to do substitution is not necessary here. We only need Eq.refl to prove 0*0=0 and 1*1=1. The whole Eq.rec block from canonical has type 0*0=0, 1*1=1, so they should be replaced with refl. canonical failed on ∃ x, x * x = 4.
I also tried import Mathlib.Tactic and replacing canonical with aesop. Aesop also failed the ∃ x, x * x = 4 test and succeeded in proving ∃ x, x * x = 0 and ∃ x, x * x = 1, so I think canonical achieved similar performance to aesop with less domain-specific knowledge.
Maybe manually optimizing for Eq in canonical will make the search faster. For example when a new search node requires a proof of a = b, always try Eq.refl by definitional equality first.
Kevin Buzzard (Apr 24 2025 at 08:01):
@Deming Xu it's possible to say which version of mathlib you want in the config file of your project. Do you have a lakefile.lean or a lakefile.toml?
Kevin Buzzard (Apr 24 2025 at 08:02):
Once you change that file, lake update mathlib will "update" mathlib to the commit or branch or tag specified in the config file.
Sebastian Ullrich (Apr 24 2025 at 08:06):
(It would be great to document these dependency pinning instructions at https://github.com/leanprover-community/mathlib4/wiki/Using-mathlib4-as-a-dependency)
Yaël Dillies (Apr 24 2025 at 10:12):
Yes sorry I told Johan I would update those instructions two weeks ago, but it slipped off the radar :sweat:
𝚠𝚘𝚓𝚌𝚒𝚎𝚌𝚑 𝚗𝚊𝚠𝚛𝚘𝚌𝚔𝚒 (Apr 25 2025 at 00:19):
Deming Xu said:
Maybe manually optimizing for
Eqin canonical will make the search faster. For example when a new search node requires a proof ofa = b, always tryEq.reflby definitional equality first.
This could be reasonable in some contexts, but in others an equality may be provable by a 'heavy rfl' which you do not want to use because it will take the kernel ages to compute through it. A slider which tells a proof search procedure how 'heavy' of a rfl is considered acceptable seems like the most flexible control for this.
Deming Xu (Apr 25 2025 at 02:21):
It worked. Thank you!
package «lean_proj» where
require mathlib from git
"https://github.com/leanprover-community/mathlib4.git" @ "v4.18.0"
require Canonical from git
"https://github.com/chasenorman/CanonicalLean.git"
I did these things:
- Change my
lean-toolchainfile toleanprover/lean4:v4.18.0 - Add the
@ "v4.18.0"to therequire mathlibline in mylakefile.lean - Delete the
.lakefolder and thelake-manifest.jsonfile - Run
lake update mathlib - Run
Reload Windowin VSCode
I thought the lean-toolchain file controlled the toolchain version, but it seems like mathlib determines it instead.
Chase Norman (Apr 25 2025 at 03:11):
@Deming Xu In short, Canonical does as you say, preferring Eq.refl to Eq.rec where possible. However, there are some complexities here due to definitional equality. Canonical decides to work on the goal x * x = 0 before the goal x : Nat, and since x * x does not definitionally reduce to 0, Eq.refl fails in the default setting.
Chase Norman (Apr 25 2025 at 03:13):
This is what the +synth flag is for, but it is currently an unstable feature.
Kim Morrison (Apr 25 2025 at 09:32):
Deming Xu said:
I thought the
lean-toolchainfile controlled the toolchain version, but it seems like mathlib determines it instead.
lean-toolchain does control the toolchain version! The subtlety is that lake update will modify it if the updated dependencies all agree on some later toolchain.
Chase Norman (May 03 2025 at 03:04):
Kim Morrison said:
I'm trying it out in VSCode a bit. Can I suggest that when you find a proof and show the info message, you also discharge the goal with sorry, so as to avoid a red squiggly (which nearly obscures the blue squiggly!)
I have implemented this on the latest 4.19.0 version of CanonicalLean.
Patrick Massot (May 03 2025 at 16:27):
Is there any hope to use canonical to prove
import Mathlib.Data.Real.Basic
def continuous_function_at (f : ℝ → ℝ) (x₀ : ℝ) :=
∀ ε > 0, ∃ δ > 0, ∀ x, |x - x₀| ≤ δ → |f x - f x₀| ≤ ε
def sequence_tendsto (u : ℕ → ℝ) (l : ℝ) :=
∀ ε > 0, ∃ N, ∀ n ≥ N, |u n - l| ≤ ε
example (f : ℝ → ℝ) (u : ℕ → ℝ) (x₀ : ℝ)
(hf : continuous_function_at f x₀) (hu : sequence_tendsto u x₀) :
sequence_tendsto (f ∘ u) (f x₀) :=
fun ε ε_pos ↦ (hf ε ε_pos).elim
fun δ ⟨δ_pos, hδ⟩ ↦ (hu δ δ_pos).elim
fun N hN ↦ ⟨N, fun n n_ge ↦ hδ (u n) <| hN n n_ge⟩
This is the example that I always use in talks when I want to show a proof that simply uses logic and two definitions without any lemma. I was not able to prove any part of it using canonical, even the last subterm (starting with fun n n_ge) that solve_by_elim or tauto find immediately.
Chase Norman (May 03 2025 at 16:46):
I will see to it that Canonical solves this instance, it is well within scope. I just need to make sure that we properly encode the notions in this proof. Would you mind submitting a “Performance Bug” GitHub issue on https://github.com/chasenorman/CanonicalLean
Chase Norman (May 03 2025 at 16:46):
Patrick Massot
Kim Morrison (May 04 2025 at 06:57):
@Chase Norman could you say something about what additional encoding will be necessary? Maybe this would be helpful for users learning what is (currently) in scope.
Chase Norman (May 04 2025 at 18:24):
@Kim Morrison @Patrick Massot In general, whnf with TransparencyMode.instances is able to eliminate instances from the problem:
#eval whnf' `(@HAdd.hAdd Real Real Real (@instHAdd Real Real.instAdd) x x)
-- Real.add✝ x x
Unfortunately this does not happen for the absolute value (unless a lower transparency level is set):
#eval whnf' `(@abs Real Real.lattice Real.instAddGroup x)
-- @abs Real Real.lattice Real.instAddGroup x
If we don't get a translation that reduces to Real.sup✝, we end up bringing in abs, max, SemilatticeSup.toMax, SemilatticeSup.sup, Real.instSemilatticeSup, inferInstance, Lattice.toSemilatticeSup, Real.lattice, DistribLattice.toLattice, Real.instMax and all of the types of the associated fields, cascading recursively. So, I don't think this is an issue of "certain problems work and others don't" but rather the need for a more robust way of handling typeclass instances, perhaps with lean-auto.
Jacob Loader (May 07 2025 at 12:42):
I gave this a try and found a small bug (please lmk if there is a better place for bug reports). Using canonical, I am given the following proof:
def Nat.isEven : ℕ → Prop := fun n => ∃ k : ℕ, n = 2 * k
example (n : ℕ) (h : isEven n) : isEven (n + 2) := by
exact
Exists.rec (motive := fun t ↦ ∃ a, n.succ.succ = zero.succ.succ.mul a)
(fun w h ↦
Exists.intro w.succ
(rec (motive := fun t ↦ n.succ.succ = zero.succ.succ.mul w.succ)
(Eq.rec (motive := fun a t ↦ n.succ.succ = a.succ.succ)
(rec (motive := fun t ↦ t.succ.succ = t.succ.succ) (Eq.refl zero.succ.succ)
(fun n n_ih ↦ Eq.refl n.succ.succ.succ) n)
h)
(fun n_1 n_ih ↦ n_ih) w))
But Lean doesn't accept this proof, on the final instance of n_ih I get the error
type mismatch
n_ih
has type
(fun t ↦ n.succ.succ = zero.succ.succ.mul ?m.444) n_1 : Prop
but is expected to have type
(fun t ↦ n.succ.succ = zero.succ.succ.mul ?m.444) n_1.succ : Prop
Interestingly, if I replace the final n_ih with "by exact n_ih" the proof does work so it seems to me that Canonical just expects Lean to do more beta-reduction than it does by default. I'm using Lean 4.20.0-rc3 (and the associated Canonical version)
Chase Norman (May 07 2025 at 17:04):
@Jacob Loader Yes, usually strategically adding by exact in some places fixes these roundtripping issues. The term is a valid Lean term, but there are no guarantees when it comes to pretty printing. I received a GitHub issue for this, but I think it will require working with the FRO.
Jared green (May 12 2025 at 19:01):
can you give us a way to add heuristics of our own to guide Canonical's search (that is, if current depth + f (node) > (max depth) then backtrack)? also expose functions like the list of hole types and the nearest complete term(with or without matching type to the goal) for this purpose?
Chase Norman (May 12 2025 at 19:36):
@Jared green Our heuristics file can be found at this link. Much of the information you mention is available (although, it might not be computationally efficient to obtain every step). We are definitely looking for improvements to our heuristics, as they are still quite simple.
Chase Norman (May 12 2025 at 19:56):
Also, please share any problems Canonical takes long on, so it can improve. This is what the Performance Bug issue template on the CanonicalLean repository is for.
Jared green (May 12 2025 at 20:04):
i was thinking of using it to do symbolic regression on things like images(the heuristic would be a function of the distance from the nearest term matching the type to a target value)
Villfuk 02 (May 17 2025 at 14:42):
Hey, I'm new to lean and I can't figure out how to install Canonical.
I'm using vscode on windows
what are the steps to set it up?
Chase Norman (May 17 2025 at 14:52):
When you start a project, there should be a lakefile.toml in the main directory. Add the following lines:
[[require]]
name = "Canonical"
scope = "chasenorman"
After you save, you should run lake update Canonical in the terminal.
Chase Norman (May 17 2025 at 14:52):
@Villfuk 02
Chase Norman (Jun 04 2025 at 21:04):
The first structure editor for Lean was added with the v4.20.0 release of Canonical.
Prove theorems, implement functions, and construct objects just by clicking.
Marcus Rossel (Jun 05 2025 at 13:13):
This could be interesting for the Lean for Teaching channel, too. Seems like it would be useful for teaching things like recursors.
Patrick Massot (Jun 05 2025 at 17:25):
The Lean for teaching channel is specifically not about teaching Lean, it’s about teaching using Lean.
Marcus Rossel (Jun 05 2025 at 18:38):
Yeah, I'm thinking of courses that use Lean to teach type theory.
David Xu (Jun 16 2025 at 12:01):
I've been wondering—could Canonical (or similar tools) eventually evolve into a constrained decoding platform for Lean, supporting the training of "first-principle-based" proof models?
As discussed in this thread, I think it's worth exploring whether we can train a mathematical model that doesn't rely on LLMs or human mathematical knowledge, but only on proof verification and first principles.
So far, in this direction, I’ve found this paper:
https://arxiv.org/html/2407.00695
Their approach is based on constrained decoding in a custom proof language:
-
Traditional LLMs repeatedly predict the next “token” of a piece of text (context/chat history). By repeating this process, the model can generate complete text one token at a time.
-
When predicting the next token, the model outputs a probability distribution over all possible tokens in the vocabulary.
-
In constrained decoding, the text being generated has a well-defined “valid syntax.” If a predicted token would violate that syntax, we forcibly set its probability to zero.
-
For example, when generating JSON, after
{"name": "John Doe", "age":, the next token can’t be,or}, but88or"hi"would be valid. So we set the probability of,and}to zero and renormalize the remaining probabilities. To my knowledge, current generation engines like SGLang and vLLM already support constrained outputs like valid JSON or outputs conforming to regex patterns.
https://leanprover.zulipchat.com/user_uploads/3121/vkcqBUt2nVTuTTIlYlBpQiJJ/refinement.gif
https://leanprover.zulipchat.com/#narrow/channel/113488-general/topic/Canonical/near/522451293
I think this approach is applicable to Lean. A few days ago, when I saw canonical's “structure editor,” I realized it’s actually a form of constrained decoding. Each hole in a partially written proof has a list of valid symbols that can be inserted—so we’re only allowed to pick a “next token” from this list.
-
If there are no more holes, the proof is complete.
-
If a hole has no valid candidates, the proof attempt has failed.
If we can programmatically interact with this interface in Python, we could train a model to predict the “next token”—or you might say, the next proof search step. (Via supervised learning, we could train the model to predict tokens that match actual Lean code closely. More advanced approaches might also be possible.)
But in reality, very few people seem to have explored constrained decoding in Lean.
(Some tactic-tree search approaches in earlier papers might count as a kind of constrained decoding, but since “writing a tactic line” is much more complex than “filling in a token,” these tactic searches couldn’t provide helpful “valid next token” lists. This makes it difficult to train non-LLM, first-principles-based models, because they don’t get useful signals early in training.)
Canonical seems to be the software closest to enabling constrained decoding for Lean. It might be able to evolve into a platform that reduces the amount of code needed for such experiments—maybe allowing a lot more people to get started.
The main challenges are:
-
Implementing a constrained decoding checker inside Lean is very complex and time-consuming.
-
Training a model from scratch on a new task type requires more parameter tuning, debugging, and model design—harder than simply using an LLM with built-in knowledge.
-
The set of valid tokens depends on which lemmas you allow in your proof. So you need a lemma selector.
-
The resulting model won’t be as flexible or general-purpose as an LLM.
But if someone manages to train a true “first-principles” model, there could be major benefits:
-
Just as different data types (images, audio, text, graphs) need different neural network designs (CNN/RNN/Transformers/GNN; autoregressive/diffusion), a model designed specifically for “constrained proof search” rather than text generation might be more efficient—smaller, faster, cheaper.
-
The model’s architecture could reveal fundamental differences between proofs and natural language.
-
Since it doesn’t rely on natural language priors, it might explore stranger mathematics or more obscure domains more effectively.
Frederick Pu (Jun 16 2025 at 14:14):
Rwkv and other state based models is good choice for this because it has rnn style forward pass with a very fast backward past. So you don't need to recompute the entire context when doing tree search
Chase Norman (Jun 16 2025 at 15:00):
You can watch my video on how and why I think this should be done here: https://youtu.be/aOSScvKEwVU?feature=shared
Also, Gabriel Poesia is a (upcoming) collaborator of mine.
Chase Norman (Jun 16 2025 at 15:12):
Nothing would need to be written in Lean. Canonical is a Rust program
Frederick Pu (Jun 26 2025 at 02:53):
How does linarith and other computer algebra tactics fit into canonical. It seems that proofs involving complex inequalities and stuff would be quicker to search in aesop.
Chase Norman (Jun 26 2025 at 02:56):
For now, if a theorem does not have a short proof term, canonical isn’t the right tool. Theory reasoning, like linear algebra solving, is an interesting potential future direction, though.
Frederick Pu (Jun 26 2025 at 11:31):
so does canonical have the same search procedure as aesop but with a simpler IR which make it faster?
Bolton Bailey (Jun 26 2025 at 12:29):
Can canonical handle indexed inductive types? I was curious if the sorry in this example is the kind of thing canonical could solve.
Chase Norman (Jun 26 2025 at 17:25):
Here's a proof.
theorem FullDataTree.toLeafDataTree_getValueAtIndex {s} {α : Type}
(tree : FullDataTree α s) (idx : SkeletonLeafIndex s) :
tree.toLeafDataTree.getValueAtIndex idx =
tree.getValueAtIndex idx.toNodeIndex :=
match tree, idx with
| FullDataTree.leaf value, SkeletonLeafIndex.ofLeaf => rfl
| FullDataTree.internal _ left _, SkeletonLeafIndex.ofLeft idxLeft =>
FullDataTree.toLeafDataTree_getValueAtIndex left idxLeft
| FullDataTree.internal _ _ right, SkeletonLeafIndex.ofRight idxRight =>
FullDataTree.toLeafDataTree_getValueAtIndex right idxRight
Canonical has rich support for indexed inductive types, but it has trouble ruling out absurd patterns. If you open an induction over tree, idx (I'm not actually sure how to do this, I did it with match and manually writing the recursive call) then each of the cases is trivial. Sorry if this answer is unsatisfying.
Chase Norman (Jun 26 2025 at 17:26):
Note that the proof term for this (despite appearances) is very long, as the compiler has to argue those absurd patterns and termination.
Aaron Liu (Jun 26 2025 at 17:28):
Chase Norman said:
If you open an induction over
tree, idx(I'm not actually sure how to do this, I did it withmatchand manually writing the recursive call) then each of the cases is trivial.
It's actually just an induction over tree that also happens to match on the idx
Chase Norman (Jun 26 2025 at 17:31):
For me, I still have to case on the idx.
theorem FullDataTree.toLeafDataTree_getValueAtIndex {s} {α : Type}
(tree : FullDataTree α s) (idx : SkeletonLeafIndex s) :
tree.toLeafDataTree.getValueAtIndex idx =
tree.getValueAtIndex idx.toNodeIndex := by
induction tree with
| leaf value => cases idx with
| ofLeaf => canonical
| internal value left right ihLeft ihRight => cases idx with
| ofLeft idxLeft =>
canonical
| ofRight idxRight =>
canonical
But Canonical can easily fill in the trivial cases from here.
Chase Norman (Jun 30 2025 at 20:14):
Canonical v4.21.0 can use simp!
Canonical now uses simp lemmas, equation compiler lemmas, and the provided premises to construct simp only and simpa only calls at various points in the proof. This makes Canonical much more powerful.
Which tactics should Canonical incorporate next?
Jared green (Jun 30 2025 at 20:25):
rw?
Jared green (Jun 30 2025 at 20:26):
nth_rewrite?
Frederick Pu (Jun 30 2025 at 22:14):
what's the general approach to incorporating custom tactics into canonical? Does canonical return the search state to lean after a certain search depth? or are lean tactics treated as opaque canonical rules. I think if this can be done generally for linarith, linear_combination and omega and other atomic proof automation tactics this could be a very strong candidate for replacing aesop
Asei Inoue (Jun 30 2025 at 22:39):
which tactics should Canonical incorporate next?
omega!
Jared green (Jul 01 2025 at 04:10):
canonical does not build the entire search space all at once, only maintaining the branch it is on at a time, returning the successful ones.
Chase Norman (Jul 01 2025 at 16:57):
Canonical does its reasoning entirely without interacting with Lean, even when generating simp invocations. Canonical can now perform reasoning modulo definitional reduction rules.
The tactic converts the Lean definitional equalities, simp lemmas, and other equality lemmas into definitional reduction rules. These define an equivalence relation among terms and Canonical only has to reason about the quotient. So, in a way, simp is pervasively applied everywhere. To convince the Lean kernel, the tactic inserts simp calls.
Chase Norman (Jul 01 2025 at 16:58):
You can see how much easier it is for Canonical to prove things in the presence of definitional reductions using the +refine UI. Canonical doesn’t have to choose to use simp or what to use it on, it’s all inherent in the search.
Mario Carneiro (Jul 01 2025 at 17:00):
how well does it handle non-directed defeqs?
Mario Carneiro (Jul 01 2025 at 17:00):
you might be able to use at least some grind lemmas the same way you use simp lemmas
Chase Norman (Jul 01 2025 at 17:01):
I think I would need e-graphs for that. It appears that type theories that use e-graphs for definitional equality are not well studied.
Chase Norman (Jul 01 2025 at 17:03):
But my performance budget is pretty tight. The routine for definitional reduction is invoked about a hundred million times per second
Jared green (Jul 02 2025 at 14:23):
what about non-definitional equality?(is that even a thing?)
Chase Norman (Jul 02 2025 at 14:34):
@Jared green the equations I’m talking about are arbitrary (terminating) rewrite rules. They are not generally definitional in Lean, but they become definitional in Canonical’s theory.
Jared green (Jul 02 2025 at 16:10):
how about breadth first case splitting with clause learning?
Jared green (Jul 02 2025 at 16:12):
(i know, nobody has implemented that yet, but it can be done)
Chase Norman (Jul 02 2025 at 16:22):
I’ve thought for some time about what the constructive DTT analogue of clause learning would be, but I’m still not sure. If any formal methods people want to work with me on this, reach out.
Jared green (Jul 02 2025 at 16:38):
what is your best guess currently?
Chase Norman (Jul 02 2025 at 21:02):
Well, if you have a type theory judgment that is falsified, you can consider the conflict set to be the metavariables that were involved in reaching that failure state. You can propagate this to behave similarly to CDCL, but I've found that it doesn't work very well in practice. This is because most branches are infinite and do not get entirely falsified.
The alternative would be define some propositional notion of conflict, like attempting to prove both a statement and its negation for each metavariable representing a proposition. If the negation ends up being proven, you can somehow eliminate similar goals from being attempted in the future. This is all very nebulous, though.
Jared green (Jul 02 2025 at 21:53):
i didnt think conflict finding would be done on every single branch either
Jared green (Jul 05 2025 at 01:57):
how long is canonical worth running now anyway?
Chase Norman (Jul 05 2025 at 03:57):
Most statements that are not solved within one second are as a result of the translation process, rather than the Canonical solver (i.e. the desired proof term is not in the search space). Significant improvements to the fidelity of the Lean embedding/encoding are planned, perhaps rectifying this.
Jared green (Jul 05 2025 at 14:10):
what translation? i thought it was a deep embedding.
Jared green (Jul 05 2025 at 14:32):
i thought the reason it would only be worth running 1 second was because the success rate drops off for longer runtimes, so a better version would have a longer horizon.
Chase Norman (Jul 05 2025 at 14:38):
The embedding is quite shallow. Translation improvements generally intend to expose more of the problem at a more shallow level.
Andrés Goens (Aug 03 2025 at 13:43):
Chase Norman said:
I think I would need e-graphs for that. It appears that type theories that use e-graphs for definitional equality are not well studied.
just curious, do you think it would make sense to connect this with (lean-)egg for stuff like this? or not really because it would be too much for your comptational budget (i.e. is this what you meant with that)?
Chase Norman (Aug 03 2025 at 14:18):
@Andrés Goens I think using e-graphs is promising, and I have an idea for how to develop this further. However, I'm not sure if off-the-shelf e-graph implementations will work.
Canonical relies on the ability to perform definitional equality checks on terms that contain metavariables. Definitional equality should only query as many head symbols as is necessary to falsify the equality. If one of these head symbols is a metavariable, the algorithm needs to pause and wait until it is assigned, at which point the algorithm resumes.
I am interested in talking about this more, perhaps in DMs.
Chase Norman (Aug 15 2025 at 23:43):
Canonical v4.22.0 has support for typeclass instances!
Powered by the new monomorphize tactic, Canonical can now solve a range of goals involving typeclass instances. Also, the Lean tactic has been rewritten from scratch. Enjoy!
import Canonical
import Mathlib.Data.Real.Basic
import Mathlib.Data.ZMod.Basic
-- Monomorphized premise
example (R : Type*) [CommRing R] (x : R) : x + 2 = 2 + x := by
-- canonical [add_comm]
exact add_comm x ↑2
-- Monomorphized simp lemma
example (R : Type*) [Monoid R] (a b c : R) : a * b * c = a * (b * c) := by
-- canonical [mul_assoc]
simp only [mul_assoc] <;> exact Eq.refl (a * (b * c))
-- Monomorphized operation (notice `+`!)
example (a b : Nat) : a + b = b + a := by
-- by canonical
exact
Nat.rec (motive := fun t ↦ t + b = b + t)
(by simp only [Nat.zero_add, Nat.add_zero] <;> exact Eq.refl b)
(fun n n_ih ↦ by simp only [Nat.add_succ, Nat.succ.injEq, Nat.succ_add] <;> exact n_ih) a
-- synthInstance
example [Finite ℕ] : False := by
-- canonical [not_finite]
exact not_finite ℕ
-- Instance in local context
example [Inhabited X] : X := by
-- canonical [default]
exact default
-- Synthesis during reconstruction
axiom HashSet (α : Type) : Type
axiom HashSet.empty [BEq α] [Hashable α] : HashSet α
noncomputable example : HashSet Nat := by
-- canonical [HashSet.empty]
exact HashSet.empty
-- Creating instances
example : Inhabited Nat := by
-- canonical
exact { default := Nat.zero }
Patrick Massot (Aug 16 2025 at 07:50):
Did you make any progress on https://github.com/chasenorman/CanonicalLean/issues/10?
Chase Norman (Aug 16 2025 at 20:33):
One more feature will be necessary to solve it
Patrick Massot (Aug 16 2025 at 21:08):
In case you need extra motivation: from August 21st to September 5th I’ll have three opportunities to show this example in propaganda talks (in Hangzhou, Hanoi and Paris). I’d love to end the demo by turning the whole proof into by canonical.
Chase Norman (Aug 17 2025 at 05:35):
So it shall be.
Jared green (Aug 17 2025 at 13:43):
what is this missing feature?
Chase Norman (Aug 20 2025 at 06:06):
Projecting out of structure types is a forward reasoning step, and Canonical only performs backward reasoning. So, we are changing the encoding of structure types. More details to follow
Chase Norman (Sep 07 2025 at 18:55):
I want to highlight that Canonical now solves an example posted by @David Xu:
import Canonical
example : ∃ n : Nat, n * n = 4 :=
-- by canonical
Exists.intro 2 (by
simp only [Nat.zero_add, Nat.zero_mul, Nat.succ_mul, Nat.succ.injEq, Nat.succ_add] <;>
exact Eq.refl Nat.zero)
I previously explained the failure, but subsequently found a way to have Canonical consider the possibility that a definitional reduction will take place.
Patrick Massot (Sep 07 2025 at 19:56):
Note it also proves that continuous functions are sequentially continuous as described in https://github.com/chasenorman/CanonicalLean/issues/10.
Chase Norman (Sep 08 2025 at 13:36):
Canonical now applies destruct pervasively, allowing much stronger reasoning about structure types and existential quantifiers. Here is the proof @Patrick Massot mentioned:
example (f : ℝ → ℝ) (u : ℕ → ℝ) (x₀ : ℝ)
(hf : continuous_function_at f x₀) (hu : sequence_tendsto u x₀) :
sequence_tendsto (f ∘ u) (f x₀) :=
-- by canonical
fun ε a ↦
Exists.intro (hu (hf ε a).choose (Exists.choose_spec (hf ε a)).1).choose fun n a_1 ↦
(Exists.choose_spec (hf ε a)).2 (u n)
(Exists.choose_spec (hu (hf ε a).choose (Exists.choose_spec (hf ε a)).1) n a_1)
Alfredo Moreira-Rosa (Sep 08 2025 at 20:32):
maybe a dumb question here and also maybe completely out of scope of canonical, but would it be possible to have some kind automated conversion from term proof to tactic proof so that generated proof read easier for people not familiar with term syntax?
Chase Norman (Sep 08 2025 at 21:19):
It would be possible to write code that works in some cases. Tactics are independent metaprograms that can execute arbitrary code (and change from version to version), so there is no analytical inverse. Terms are fixed in meaning and are conceptually a lot simpler—just functions, applications, and variables. This is what allows us to search for them and be assured that the proof is correct. Anyways, this could be a project for someone that knows the tactics in Lean better than I do.
Chase Norman (Sep 08 2025 at 21:21):
Also, presumably terms are preferred for non-propositions.
Asei Inoue (Sep 08 2025 at 21:26):
@Chase Norman
Would it be possible to make it optional to turn off canonical’s suggestion of terms? Sometimes all that’s needed is to simply close the goal with by canonical, and the explicit display of the proof term isn’t important.
Chase Norman (Sep 08 2025 at 21:39):
I could add a canonical! tactic that does this. But you’d assume the risk of non-deterministic output/success, and calling expensive Rust FFI with each compilation.
Frederick Pu (Sep 08 2025 at 21:52):
maybe the proof term could be cached and ffi is only called again if the cached proof term fails to discharge the goal
Kim Morrison (Sep 08 2025 at 23:16):
Unfortunately there's no caching mechanism available, as far as I understand. By the time you realise a cache is invalid, it's potentially hard for the user to work out how to fix it
Chase Norman (Sep 09 2025 at 23:21):
import Canonical
import PremiseSelection
open Lean PremiseSelection
set_premise_selector
Cloud.premiseSelector
<|> mepoSelector (useRarity := true)
example {x} (p : x < 1) : x = 0 :=
-- by canonical +premises
by simpa only [Nat.div_one] using Nat.div_eq_of_lt p
Canonical also has support for Lean's premise selection API
Kim Morrison (Sep 10 2025 at 02:40):
Nice work! I think this is the first public integration with the premise selection API.
(I would love to tune the parameters in MePo, which currently are just copied from the Isabelle implementation.)
Kim Morrison (Sep 10 2025 at 02:40):
And we really need more backends available to everyone.
Chase Norman (Sep 15 2025 at 14:22):
Canonical v4.23.0 and v4.24.0-rc1 are available now. By popular demand, Canonical will start supporting release candidate versions. This is also in preparation for ITP 2025—I look forward to meeting some of you there.
Kim Morrison (Oct 27 2025 at 00:29):
@Chase Norman what are the dependencies of the Canonical lean library? (e.g. python, rust, etc)? What happens if they are not available? (I'm happy to investigate myself, but if you have ready answers that's easier. :-)
Chase Norman (Oct 27 2025 at 00:32):
@Kim Morrison There are no dependencies. Canonical is precompiled for Mac, Linux, and Windows and runs without Rust installed.
Kim Morrison (Oct 27 2025 at 00:33):
Lovely. :-)
Kim Morrison (Oct 27 2025 at 00:33):
I'm investigating what it takes to add it as a Mathlib require.
Kim Morrison (Oct 27 2025 at 00:33):
We will need to merge #30952 first to allow lake exe cache integration.
Kim Morrison (Oct 27 2025 at 00:34):
(@Patrick Massot, pinging you here as you asked about progress on this front.)
Kim Morrison (Oct 27 2025 at 00:48):
#30953 is the current experiment. It won't pass CI or provide cached oleans until (at least) we have #30844 and #30952 merged, but it appears to work fine locally.
If anyone would like to check it out and confirm that lake build && lake test works for them that would be helpful signal.
Kim Morrison (Oct 27 2025 at 00:48):
However there are separate problems in CI.
@Chase Norman, could you take a look at https://github.com/leanprover-community/mathlib4/actions/runs/18826299417/job/53709575129?pr=30953#step:20:31 and tell me if you have a guess as to what is going wrong?
Kim Morrison (Oct 27 2025 at 00:51):
Ah, this is (I think) just that we need to merge #30952 first.
Kim Morrison (Oct 27 2025 at 00:52):
I'll pause here until we have some CI-savvy maintainers available. :-)
Kim Morrison (Oct 27 2025 at 23:48):
Okay, those PRs are merged, but CI for #30953 is still failing.
@Chase Norman, if I start with a fresh checkout, at what step is the .so file actually been retrieved?
In our current CI setup, it's only during lake env and lake exe cache get that the build has network access.
Chase Norman (Oct 27 2025 at 23:49):
@Kim Morrison It is when Canonical is pulled from GitHub
Kim Morrison (Oct 27 2025 at 23:49):
I see that on the canonical branch, if I rm -rf .lake, then lake env creates lake/packages/Canonical/.lake, but not .lake/packages/Canonical/.lake/build/lib/libCanonical.so.
Chase Norman (Oct 27 2025 at 23:50):
For some reason it is not deciding to pull the precompiled version
Kim Morrison (Oct 27 2025 at 23:50):
So it isn't necessarily pulled at the same time as .lake/packages/Canonical/Canonical.lean appears.
Kim Morrison (Oct 27 2025 at 23:51):
Okay, can I hand this over to you to investigate?
Kim Morrison (Nov 05 2025 at 06:31):
Any ideas on this one, @Chase Norman?
Chase Norman (Nov 05 2025 at 07:29):
We might need to rope @Mac Malone into this. I’ve added an extraDepTargets line to the lakefile to try to force the archive to be pulled, but for some reason it does not do so when there is already a lake-manifest.json present.
Mac Malone (Nov 05 2025 at 07:33):
In Mathlib, I feel the proper time to pull any build-related data from the network is during lake exe cache get. So I would suggest doing something similar to the current ProofWidgets special case for Canonical.
𝚠𝚘𝚓𝚌𝚒𝚎𝚌𝚑 𝚗𝚊𝚠𝚛𝚘𝚌𝚔𝚒 (Nov 05 2025 at 20:40):
I haven't thought about this for a while now, but I am not excited about that special case existing. It is clearly not scalable to special-case every package that relies on binary artifacts that are not easily compilable on the downstream user's machine. How far away are we from having a mode of require that enforces that prebuilt release data is downloaded in Lake?
Mac Malone (Nov 05 2025 at 20:43):
@𝚠𝚘𝚓𝚌𝚒𝚎𝚌𝚑 𝚗𝚊𝚠𝚛𝚘𝚌𝚔𝚒 Under normal circumstances, the right place for downlaoding build data in Lake is in lake build. It is a limitation of Mathlib's current cache (and CI) structure that this is done in cache.
Mac Malone (Nov 05 2025 at 20:45):
Special casing in Mathlib seems unavoidable anyway, as being explicit about what should be downloaded is desirable for security reasons.
𝚠𝚘𝚓𝚌𝚒𝚎𝚌𝚑 𝚗𝚊𝚠𝚛𝚘𝚌𝚔𝚒 (Nov 05 2025 at 20:50):
Mac Malone said:
𝚠𝚘𝚓𝚌𝚒𝚎𝚌𝚑 𝚗𝚊𝚠𝚛𝚘𝚌𝚔𝚒 Under normal circumstances, the right place for downlaoding build data in Lake is in
lake build. It is a limitation of Mathlib's current cache (and CI) structure that this is done in cache.
Hm, then maybe a better question would be: is mathlib forever going to be doing strange things that make it not-just-a-normal-package, or is there a path towards adding whatever is needed to make cache work well for any package into Lake?
Mac Malone (Nov 05 2025 at 20:52):
The hope is that Mathlib will eventually be implemntedable via a lakefile.toml and cache will be integrated with lake and together these can be used to solve the current cache and security challenges.
𝚠𝚘𝚓𝚌𝚒𝚎𝚌𝚑 𝚗𝚊𝚠𝚛𝚘𝚌𝚔𝚒 (Nov 05 2025 at 20:52):
Mac Malone said:
Special casing in Mathlib seems unavoidable anyway, as being explicit about what should be downloaded is desirable for security reasons.
Do you mean that allowing a package to fetch prebuilt artifacts is putting more trust on it than just requireing it? That doesn't seem right, given that lakefiles (and Lean modules) can run arbitrary code.
Chase Norman (Nov 05 2025 at 20:56):
@Mac Malone could we write a version of this special case that is generic over archive name and repo? Essentially, ProofWidgets and Canonical could copy their release info from the lakefile into this function and it would perform the install during cache get?
Mac Malone (Nov 05 2025 at 21:00):
𝚠𝚘𝚓𝚌𝚒𝚎𝚌𝚑 𝚗𝚊𝚠𝚛𝚘𝚌𝚔𝚒 The difference is that some lake actionas are only run on code from master vs with the pull request. My understanding is that the network code only runs under the master. lake build, inversely, runs with respect to the PR and without network access.
Chase Norman (Nov 05 2025 at 21:01):
Does lake build have to run without network? Is this the security issue?
Mac Malone (Nov 05 2025 at 21:01):
Yes, that is the decision as of now (as I understand).
Last updated: Dec 20 2025 at 21:32 UTC
