Zulip Chat Archive

Stream: mathlib4

Topic: slim_check question

Kevin Buzzard (Jan 13 2024 at 13:08):

Should this happen? Am I misunderstanding something?

import Mathlib.Data.Real.GoldenRatio
import Mathlib.Tactic

open BigOperators

example (q : ℕ) : Nat.fib (2 * q + 1) * 2 ^ (2 * q) =
    ∑ k in Finset.range (q + 1), 5 ^ k * Nat.choose (2 * q + 1) (2 * k + 1) := by
  slim_check
  /-
  Found problems!
  q := 1
  issue: 8 = 6 does not hold
  (0 shrinks)
  -/
  sorry

example (q : ℕ) : Nat.fib (2 * q + 1) * 2 ^ (2 * q) =
    ∑ k in Finset.range (q + 1), 5 ^ k * Nat.choose (2 * q + 1) (2 * k + 1) := by
  have hq : q = 1 := sorry -- test this case in the kernel
  subst hq
  rfl -- works fine??

(context: I am currently totally confused about whether the thing I'm trying to prove is true)

Eric Wieser (Jan 13 2024 at 13:56):

set_option trace.slim_check.decoration true shows that something is going wrong

Eric Wieser (Jan 13 2024 at 13:56):

It replaced one of the qs with k!

Eric Wieser (Jan 13 2024 at 13:58):

I would guess the bug is this raw matching on forallE: https://github.com/leanprover-community/mathlib4/blob/2523a65bd2b94019a3afb291e9499abcaa0f7c0a/Mathlib/Testing/SlimCheck/Testable.lean#L514C19-L519

Henrik Böving (Jan 13 2024 at 14:00):

Adding the data back in there in L518 is definitely wrong I would say? you're definitely supposed to recompute it after potentially changing the expression right?

Kevin Buzzard (Jan 13 2024 at 14:08):

FWIW here's slim_check finding problems with a definitely-true goal:

import Mathlib.Algebra.BigOperators.Basic
import Mathlib.Tactic.SlimCheck

open BigOperators

example (q : ℕ) : q = 0 ∨ q ≥ 2 ∨
    8 = ∑ k in Finset.range 2, 5 ^ k * Nat.choose (2 * q + 1) (2 * k + 1) := by
  slim_check -- claims to find problem with q=1

example (q : ℕ) : q = 0 ∨ q ≥ 2 ∨
    8 = ∑ k in Finset.range 2, 5 ^ k * Nat.choose (2 * q + 1) (2 * k + 1) := by
  match q with
  | 0 => left; rfl
  | 1 => right; right; rfl
  | (n + 2) => right; left; simp only [ge_iff_le, le_add_iff_nonneg_left, zero_le]
  -- but there are no problems

Henrik Böving (Jan 13 2024 at 14:17):

Kevin Buzzard said:

FWIW here's slim_check finding problems with a definitely-true goal:

import Mathlib.Algebra.BigOperators.Basic
import Mathlib.Tactic.SlimCheck

open BigOperators

example (q : ℕ) : q = 0 ∨ q ≥ 2 ∨
    8 = ∑ k in Finset.range 2, 5 ^ k * Nat.choose (2 * q + 1) (2 * k + 1) := by
  slim_check -- claims to find problem with q=1

example (q : ℕ) : q = 0 ∨ q ≥ 2 ∨
    8 = ∑ k in Finset.range 2, 5 ^ k * Nat.choose (2 * q + 1) (2 * k + 1) := by
  match q with
  | 0 => left; rfl
  | 1 => right; right; rfl
  | (n + 2) => right; left; simp only [ge_iff_le, le_add_iff_nonneg_left, zero_le]
  -- but there are no problems

I do btw find this rather amusing because we actually made slimcheck proof producing to stop this from happening, but I guess when we change the statement its still possible to cheat :D

Eric Wieser (Jan 13 2024 at 14:20):

I think using mapForallTelescope is almost the right tool here, but I can't find out how to force it to do one binder at a time

Eric Wieser (Jan 13 2024 at 14:21):

Almost of the forallTelescope API seems to be greedy, and I can't see an easy way to move inside a single binder

Eric Wieser (Jan 13 2024 at 14:32):

#9722 is my attempt at a fix

Eric Rodriguez (Jan 13 2024 at 15:28):

Didn't @Joachim Breitner run into something similar with loogle about wanting the right forall telescope?

Eric Wieser (Jan 13 2024 at 15:35):

I think docs#Lean.Meta.withLocalDecl is what I wanted here

Joachim Breitner (Jan 13 2024 at 15:50):

I find it always scary when other people remember better what I did, or tired, or wondered about, in the past. Sometimes my :brain: is :cheese:. :shrug:

Eric Wieser (Jan 31 2024 at 23:26):

Eric Wieser said:

#9722 is my attempt at a fix

(See https://leanprover.zulipchat.com/#narrow/stream/113488-general/topic/noisy.20tests/near/419142386 for discussion about unrelated CI issues)

Mrigank Pawagi (Feb 21 2024 at 16:23):

I want to introduce some enhancements to SlimCheck, and I am looking for some benchmark to test my hypothesised enhancements. Is there any collection of incorrect claims that I could use? Any help would be very helpful.

Damiano Testa (Feb 21 2024 at 16:25):

You mean like

example {n : Nat} : n = n + 1 := sorry

Mrigank Pawagi (Feb 21 2024 at 16:25):

Yes

Damiano Testa (Feb 21 2024 at 16:31):

I do not know of any such repository, but, ideally, any result in mathlib actually uses all of its hypotheses: removing one of them should yield a false statement. Would this count?

Eric Wieser (Feb 22 2024 at 01:56):

I assume you're aware of the tests already in mathlib?

Mrigank Pawagi (Feb 22 2024 at 03:44):

@Damiano Testa I think that would be an interesting approach. I will try that. Also @Eric Wieser, no I wasn't aware but I just checked... There are a large number of tests over there. Thanks!

Damiano Testa (Feb 22 2024 at 12:55):

If you want something that is true for a long time, you can also use something like

import Mathlib

example {n : Nat} : if n < 100 then True else False := by
  slim_check
  sorry

Mrigank Pawagi (Feb 22 2024 at 16:50):

Sure, yes. However I am looking for cases where the counterexamples are hard to find using slimcheck's heuristic.

Eric Wieser (Feb 22 2024 at 16:57):

Slimcheck has a pretty fatal design flaw that means it cannot be used (nor extended) to find counter-examples for anything noncomputable

Eric Wieser (Feb 22 2024 at 16:58):

One interpretation of the flaw is that it should be sampling elements from Q($X), but currently samples them from X

Emilie (Shad Amethyst) (Feb 22 2024 at 20:47):

How about something like example {n : Nat} : Prime (2 ^ (n + 2) - 1)?

Mrigank Pawagi (Feb 24 2024 at 17:41):

Yes @Emilie (Shad Amethyst), that is a good example!

Mrigank Pawagi (Feb 24 2024 at 17:52):

@Eric Wieser I see... I am still familiarising myself with Lean and although I can see what you might mean by noncomputable, I did not understand the Q($X) analogy. Could you please expand on that a little bit? Thanks a lot!

Eric Wieser (Feb 24 2024 at 17:53):

Q($X) is "lean expressions that evaluate to type X"

Eric Wieser (Feb 24 2024 at 17:53):

As opposed to "terms of type X"

Eric Wieser (Feb 24 2024 at 17:55):

For instance, (Expr.const ``Real.sqrt []).app r : Expr is "the same" as q(Real.sqrt r) : Q(Real) and both are computable, but Real.sqrt r is not.

Mrigank Pawagi (Feb 24 2024 at 18:05):

Okay, I understand more now.

Mrigank Pawagi (Feb 24 2024 at 18:08):

I have previously worked on benchmarking Python code generated by LLMs, during which I realised that the "thoroughness" that property-based testing (like through SlimCheck) can achieve in finding counterexamples is matched by some generative-AI assisted techniques like type-aware mutations. I want to see if this can be extended to SlimCheck, but I now wonder if it is possible to "extend" SlimCheck at all.

Eric Wieser (Feb 24 2024 at 18:09):

Calling out to an AI is currently impossible in slim_check, because the extensions have access to no state other than a random number generator; so no IO

Eric Wieser (Feb 24 2024 at 18:10):

#3838 was an attempt to change that (I think for unrelated reasons), though it makes the x : X vs e : Q($X) problem worse

Mrigank Pawagi (Feb 24 2024 at 18:16):

Ohh, okay. Yes, I see that calling an AI may not be possible from within SlimCheck. One could take this process out of the lean project itself and instead make something like a VSCode extension.

Last updated: May 02 2025 at 03:31 UTC