Zulip Chat Archive
Stream: mathlib4
Topic: slim_check question
Kevin Buzzard (Jan 13 2024 at 13:08):
Should this happen? Am I misunderstanding something?
import Mathlib.Data.Real.GoldenRatio
import Mathlib.Tactic
open BigOperators
example (q : ℕ) : Nat.fib (2 * q + 1) * 2 ^ (2 * q) =
∑ k in Finset.range (q + 1), 5 ^ k * Nat.choose (2 * q + 1) (2 * k + 1) := by
slim_check
/-
Found problems!
q := 1
issue: 8 = 6 does not hold
(0 shrinks)
-/
sorry
example (q : ℕ) : Nat.fib (2 * q + 1) * 2 ^ (2 * q) =
∑ k in Finset.range (q + 1), 5 ^ k * Nat.choose (2 * q + 1) (2 * k + 1) := by
have hq : q = 1 := sorry -- test this case in the kernel
subst hq
rfl -- works fine??
(context: I am currently totally confused about whether the thing I'm trying to prove is true)
Eric Wieser (Jan 13 2024 at 13:56):
set_option trace.slim_check.decoration true
shows that something is going wrong
Eric Wieser (Jan 13 2024 at 13:56):
It replaced one of the q
s with k
!
Eric Wieser (Jan 13 2024 at 13:58):
I would guess the bug is this raw matching on forallE
: https://github.com/leanprover-community/mathlib4/blob/2523a65bd2b94019a3afb291e9499abcaa0f7c0a/Mathlib/Testing/SlimCheck/Testable.lean#L514C19-L519
Henrik Böving (Jan 13 2024 at 14:00):
Adding the data back in there in L518 is definitely wrong I would say? you're definitely supposed to recompute it after potentially changing the expression right?
Kevin Buzzard (Jan 13 2024 at 14:08):
FWIW here's slim_check
finding problems with a definitely-true goal:
import Mathlib.Algebra.BigOperators.Basic
import Mathlib.Tactic.SlimCheck
open BigOperators
example (q : ℕ) : q = 0 ∨ q ≥ 2 ∨
8 = ∑ k in Finset.range 2, 5 ^ k * Nat.choose (2 * q + 1) (2 * k + 1) := by
slim_check -- claims to find problem with q=1
example (q : ℕ) : q = 0 ∨ q ≥ 2 ∨
8 = ∑ k in Finset.range 2, 5 ^ k * Nat.choose (2 * q + 1) (2 * k + 1) := by
match q with
| 0 => left; rfl
| 1 => right; right; rfl
| (n + 2) => right; left; simp only [ge_iff_le, le_add_iff_nonneg_left, zero_le]
-- but there are no problems
Henrik Böving (Jan 13 2024 at 14:17):
Kevin Buzzard said:
FWIW here's
slim_check
finding problems with a definitely-true goal:import Mathlib.Algebra.BigOperators.Basic import Mathlib.Tactic.SlimCheck open BigOperators example (q : ℕ) : q = 0 ∨ q ≥ 2 ∨ 8 = ∑ k in Finset.range 2, 5 ^ k * Nat.choose (2 * q + 1) (2 * k + 1) := by slim_check -- claims to find problem with q=1 example (q : ℕ) : q = 0 ∨ q ≥ 2 ∨ 8 = ∑ k in Finset.range 2, 5 ^ k * Nat.choose (2 * q + 1) (2 * k + 1) := by match q with | 0 => left; rfl | 1 => right; right; rfl | (n + 2) => right; left; simp only [ge_iff_le, le_add_iff_nonneg_left, zero_le] -- but there are no problems
I do btw find this rather amusing because we actually made slimcheck proof producing to stop this from happening, but I guess when we change the statement its still possible to cheat :D
Eric Wieser (Jan 13 2024 at 14:20):
I think using mapForallTelescope
is almost the right tool here, but I can't find out how to force it to do one binder at a time
Eric Wieser (Jan 13 2024 at 14:21):
Almost of the forallTelescope
API seems to be greedy, and I can't see an easy way to move inside a single binder
Eric Wieser (Jan 13 2024 at 14:32):
#9722 is my attempt at a fix
Eric Rodriguez (Jan 13 2024 at 15:28):
Didn't @Joachim Breitner run into something similar with loogle about wanting the right forall telescope?
Eric Wieser (Jan 13 2024 at 15:35):
I think docs#Lean.Meta.withLocalDecl is what I wanted here
Joachim Breitner (Jan 13 2024 at 15:50):
I find it always scary when other people remember better what I did, or tired, or wondered about, in the past. Sometimes my :brain: is :cheese:. :shrug:
Eric Wieser (Jan 31 2024 at 23:26):
Eric Wieser said:
#9722 is my attempt at a fix
(See https://leanprover.zulipchat.com/#narrow/stream/113488-general/topic/noisy.20tests/near/419142386 for discussion about unrelated CI issues)
Mrigank Pawagi (Feb 21 2024 at 16:23):
I want to introduce some enhancements to SlimCheck, and I am looking for some benchmark to test my hypothesised enhancements. Is there any collection of incorrect claims that I could use? Any help would be very helpful.
Damiano Testa (Feb 21 2024 at 16:25):
You mean like
example {n : Nat} : n = n + 1 := sorry
?
Mrigank Pawagi (Feb 21 2024 at 16:25):
Yes
Damiano Testa (Feb 21 2024 at 16:31):
I do not know of any such repository, but, ideally, any result in mathlib actually uses all of its hypotheses: removing one of them should yield a false statement. Would this count?
Eric Wieser (Feb 22 2024 at 01:56):
I assume you're aware of the tests already in mathlib?
Mrigank Pawagi (Feb 22 2024 at 03:44):
@Damiano Testa I think that would be an interesting approach. I will try that. Also @Eric Wieser, no I wasn't aware but I just checked... There are a large number of tests over there. Thanks!
Damiano Testa (Feb 22 2024 at 12:55):
If you want something that is true for a long time, you can also use something like
import Mathlib
example {n : Nat} : if n < 100 then True else False := by
slim_check
sorry
Mrigank Pawagi (Feb 22 2024 at 16:50):
Sure, yes. However I am looking for cases where the counterexamples are hard to find using slimcheck's heuristic.
Eric Wieser (Feb 22 2024 at 16:57):
Slimcheck has a pretty fatal design flaw that means it cannot be used (nor extended) to find counter-examples for anything noncomputable
Eric Wieser (Feb 22 2024 at 16:58):
One interpretation of the flaw is that it should be sampling elements from Q($X)
, but currently samples them from X
Emilie (Shad Amethyst) (Feb 22 2024 at 20:47):
How about something like example {n : Nat} : Prime (2 ^ (n + 2) - 1)
?
Mrigank Pawagi (Feb 24 2024 at 17:41):
Yes @Emilie (Shad Amethyst), that is a good example!
Mrigank Pawagi (Feb 24 2024 at 17:52):
@Eric Wieser I see... I am still familiarising myself with Lean and although I can see what you might mean by noncomputable, I did not understand the Q($X)
analogy. Could you please expand on that a little bit? Thanks a lot!
Eric Wieser (Feb 24 2024 at 17:53):
Q($X)
is "lean expressions that evaluate to type X
"
Eric Wieser (Feb 24 2024 at 17:53):
As opposed to "terms of type X
"
Eric Wieser (Feb 24 2024 at 17:55):
For instance, (Expr.const ``Real.sqrt []).app r : Expr
is "the same" as q(Real.sqrt r) : Q(Real)
and both are computable, but Real.sqrt r
is not.
Mrigank Pawagi (Feb 24 2024 at 18:05):
Okay, I understand more now.
Mrigank Pawagi (Feb 24 2024 at 18:08):
I have previously worked on benchmarking Python code generated by LLMs, during which I realised that the "thoroughness" that property-based testing (like through SlimCheck) can achieve in finding counterexamples is matched by some generative-AI assisted techniques like type-aware mutations. I want to see if this can be extended to SlimCheck, but I now wonder if it is possible to "extend" SlimCheck at all.
Eric Wieser (Feb 24 2024 at 18:09):
Calling out to an AI is currently impossible in slim_check
, because the extensions have access to no state other than a random number generator; so no IO
Eric Wieser (Feb 24 2024 at 18:10):
#3838 was an attempt to change that (I think for unrelated reasons), though it makes the x : X
vs e : Q($X)
problem worse
Mrigank Pawagi (Feb 24 2024 at 18:16):
Ohh, okay. Yes, I see that calling an AI may not be possible from within SlimCheck. One could take this process out of the lean project itself and instead make something like a VSCode extension.
Last updated: May 02 2025 at 03:31 UTC