Zulip Chat Archive

Stream: lean4

Topic: Model finding/Automatic counter example generation in Lean

Jonas Bayer (Jul 15 2024 at 17:00):

As someone who originally started doing formalisation in Isabelle, I am used to having a counter example finder that lets me know when the theorem I just stated is wrong for trivial reasons. This has sometimes been a big time saver since it kept me from trying to formalise a wrong theorem.

I wondered if anyone is currently working on/planning to work on something similar for Lean. As far as I know, Mathlib has a tool called SlimCheck which is a port of Haskell's QuickCheck. Essentially that works by randomly generating finite structures and testing the given lemma on them. How widely is SlimCheck currently used?

Isabelle has two tools for model finding. The leightweight 'quickcheck' always runs in the background but has limited capability. And there is Nitpick which often succeeds in finding (finite) counterexamples when quickcheck fails, but it has to be invoked manually. Nitpick works by translating the problem to the logic of Kodkod which is a SAT-based model finder. It is not limited to counterexample search but can also find positive models. For example, when you make the definition of a graph you can let Nitpick generate a (finite) example and even specify the cardinality (number of vertices).

I assume that similar tools might exist in other proof assistants, but I've not done the research yet.

In general I find the topic very interesting and could imagine spending some time on it; which is why I wanted to check if other people are working on model finding in Lean already.

Henrik Böving (Jul 15 2024 at 17:04):

SlimCheck is the only tool dedicated to this currently. There are a few automation tactics that can also come up with counter examples. For example omega is a linear integer arithmetic tactic that tries to generate a counter example for the linear constraints it was given (not that this does not have to be sound, there might just be constrains that omega doesn't understand that invalidate the counter example) and LeanSAT is capable of counter example generation for arbitrary BitVec and boolean problems (same constraints thing as omega).

There does not currently exist a Nitpick style tool to my knowledge.

Jannis Limperg (Jul 15 2024 at 17:20):

... and afaik nobody is working on anything like it and it also doesn't exist for other proof assistants (but I could be wrong there). The problem, as always, is that it's a ton of work with very limited academic payoff.

Kim Morrison (Jul 16 2024 at 18:11):

SlimCheck could be a lot better than it is at present: no one has seriously worked on it since the initial implementation, so we're still missing basic generators for types, etc.

There's also some harder work but that would make SlimCheck more powerful:

switch to generating Exprs rather than actual terms
enable running MetaM code during generation, so e.g. we can run calculations to filter results. (There have been some previous attempts at this that foundered.)

Kim Morrison (Jul 16 2024 at 18:12):

I would love to see automatic counterexample generation on every new declaration!

Jonas Bayer (Jul 17 2024 at 09:12):

Thanks for the helpful replies! It's great to hear that there is interest in principle.

@Kim Morrison so you think that using SlimCheck as a basis for lightweight checking in the background would be a good approach? I'm not sure how Isabelle quickcheck works, it might well do a very similar thing but probably it's worthwhile to check.

Kim Morrison (Jul 17 2024 at 15:12):

To be honest, I don't have a super high level of confidence that we could get SlimCheck to the point that it is performant enough to run in the background everywhere. But this is ignorance rather than a specific negative reason. I do really hope that someone will experiment in this direction, and either come up with something we could deploy for everyone, or failing that teach us what needs to be done instead!

Mike Hicks (Oct 04 2024 at 17:51):

Coming back to this thread -- @Jonas Bayer did you spend any more time looking into SlimCheck for your use-case? It seems on a cursory search of GitHub that basically no one uses SlimCheck, and I am curious why that might be. Anyone know of serious use-cases?

I'm a big fan of property-based testing in general, and I'm also familiar with the QuickChick framework for Coq (see https://softwarefoundations.cis.upenn.edu/qc-current/index.html). I'd love to see good support for property-based testing in Lean, and I'd be willing to help with that.

@Kim Morrison if you had a wishlist of what needs to be done to get SlimCheck to the point that it could be used easily, even by new users, I'd be glad to know it. (I see your ideas above, but if you have more comprehensive thoughts I'm glad to know them.)

Mike Hicks (Oct 04 2024 at 19:29):

I also imagine that some potential users would prefer SlimCheck to not be in Mathlib, but rather in Std or someplace smaller ...

Henrik Böving (Oct 04 2024 at 19:30):

I've been contemplating to pull out slim check from mathlib for a while. Given that I wrote most of the ported code I would assume that's fine with mathlib

François G. Dorais (Oct 05 2024 at 05:17):

@Henrik Böving In the event that SlimCheck can't find a place in Lean, it would be a welcome addition to Batteries.

Jonas Bayer (Oct 08 2024 at 14:41):

@Mike Hicks I do not have a specific use case per se; the reason I'm interested in this topic is that in my experience it was valuable to have tools like Quickcheck/Nitpick in Isabelle. Sometimes when you state a lemma a hypothesis might be missing and these tools were quite good at detecting this (not 100% reliably, of course, but sufficiently often to be helpful).

The reason that you can't find example use cases of Slimcheck on Github might also be due to the nature of the tool. If its used in formalisation and detects that a lemma is trivially wrong the user will usually just fix the lemma and not keep the Slimcheck call in the proof. Though it might also just genuinely not be used a lot..

Eric Wieser (Oct 08 2024 at 14:49):

I've been contemplating to pull out slim check from mathlib for a while.

I think this probably isn't worth the churn; as written slim_check isn't really fit for purpose, as it can only find witnesses for decidable predicates, rather than norm_num- or aesop-able predicates. Changing this requires a change to slim_check to pass around Q(_) or Expr objects.

Eric Wieser (Oct 08 2024 at 14:50):

I think it would be best to improve it in mathlib where there are more meta reviewers, and migrate it only after such a refactor

Henrik Böving (Oct 08 2024 at 14:52):

Well apart form being useful as a tactic for these things slim_check can also be considered a general property testing tool that software people would want to use. These kinds of users often don't care about non decidable predicates so having slim_check available just like that + potentially a deriving mechanism would already be a huge win for that community. Given the pace of mathlib reviews development outside of mathlib would most likely also be faster.

Matthew Fairtlough (Oct 10 2024 at 23:03):

I'm intending to use SlimCheck extensively, it's a fantastic package and concept. I want to search for interesting theorems automatically and it's good to test before embarking on an attempted proof or disproof. And to use it for model checking would be interesting to me. For example, for checking statements in modal logic. But it really needs some documentation, that would be #1 on my wishlist. Even fixing the few examples in the sources would be great for a newcomer. Is anyone working on a SlimCheck manual I wonder?

Henrik Böving (Oct 11 2024 at 10:08):

Henrik Böving said:

Well apart form being useful as a tactic for these things slim_check can also be considered a general property testing tool that software people would want to use. These kinds of users often don't care about non decidable predicates so having slim_check available just like that + potentially a deriving mechanism would already be a huge win for that community. Given the pace of mathlib reviews development outside of mathlib would most likely also be faster.

So is there anyone here that has a hard reason against this? Otherwise I'll pull it to leanprover/SlimCheck today.

Mike Hicks (Oct 11 2024 at 10:28):

I spoke to another Lean user yesterday and she suggested that moving Slimcheck into Lean (rather than Std or Batteries) would be ideal. @François G. Dorais above also suggests going this way. Thanks!

Kevin Buzzard (Oct 11 2024 at 12:16):

I have this dream that slim_check will one day disprove statements about topological spaces (say) by trying a random topological space of size 3 and seeing what decide makes of the statement. Will moving it into core prevent this dream from happening?

Henrik Böving (Oct 11 2024 at 12:17):

No, the goal should always be to make it extensible

François G. Dorais (Oct 13 2024 at 03:22):

@Henrik Böving I lost track of the discussion, I really want to make SlimCheck more accessible. Getting it out of Mathlib is a great step in that direction. I'm just checking in what the current plan is.

SlimCheck makes perfect sense in a separate repo, it also makes sense in Lean and in Batteries. Which option are you going for?

Henrik Böving (Oct 13 2024 at 09:03):

The first

Henrik Böving (Oct 13 2024 at 09:38):

https://github.com/hargoniX/SlimCheck i put it here for now so I have somewhere to push while I work on pulling it from mathlib. But it'll eventually move to leanprover/ I think, so probably don't depend on this one (it doesn't even compile right now anyways)

Kim Morrison (Oct 13 2024 at 12:55):

I'm going to start a quick conversation amongst Mathlib maintainers to get approval for adding leanprover/SlimCheck as a Mathlib dependency. It may be that in this case it is pro forma, but we'll see what comes up.

Kim Morrison (Oct 13 2024 at 23:31):

Two things that came up in maintainers discussion:

Kim Morrison (Oct 13 2024 at 23:31):

if I understand correctly about our plans to work on this in the near future, perhaps moving it to leanprover-community/SlimCheck makes more sense for now than leanprover/SlimCheck.

Kim Morrison (Oct 13 2024 at 23:32):

perhaps the material on RandT and ULiftable that currently is in Mathlib, supporting SlimCheck, doesn't really belong in a standalone repository, and could be moved to Batteries with the separated SlimCheck depending on it.

François G. Dorais (Oct 14 2024 at 01:32):

RandT would be great in Batteries now that we will soon have Mersenne Twister batteries#984, and other PRNG planned.

Henrik Böving (Oct 14 2024 at 07:01):

Kim Morrison said:

perhaps the material on RandT and ULiftable that currently is in Mathlib, supporting SlimCheck, doesn't really belong in a standalone repository, and could be moved to Batteries with the separated SlimCheck depending on it.

ULiftable is not necessary, you can just write a 5 liner specific to Gen. RandT sure.

Mike Hicks (Oct 20 2024 at 18:50):

Probably this deserves another thread, but I'll post here to gauge interest:

The QuickChick library for Coq has some pretty cool support for automatically deriving generators for simple, inductively defined types that satisfy a dependent predicate. Think: Binary trees (the type) that satisfy the binary search tree ordering invariant (the predicate). The same support can also translate predicates in Prop into computable checkers, e.g., translating an inductive relation defining the BST invariant into a checker from binary trees to bool. All described in this paper.

This support makes QuickChick much more usable. What are folks' thoughts about trying to implement equivalent support for it in Slimcheck? It seems to me that one benefit of Lean, compared to Coq, is the much better metaprogramming support that should hopefully make it easier to implement.

Henrik Böving (Oct 20 2024 at 19:49):

This has been on my mind for years, it's quite an obvious step to take to improve the adoption of the tool across the eco system, just didn't have the time/priority for me yet unfortunately.

Kevin Buzzard (Oct 20 2024 at 19:54):

Would it be a good Masters project for a CS student or is it too technical?

Yuri (Oct 23 2024 at 12:43):

Mike Hicks said:

Probably this deserves another thread, but I'll post here to gauge interest:

... The same support can also translate predicates in Prop into computable checkers, e.g., translating an inductive relation defining the BST invariant into a checker from binary trees to bool. All described in this paper.

Oh, this idea of converting Prop predicates to checkers is interesting and could also be applied to FFI.

When interacting with foreign code or external components, it would be great to check (incoming) runtime values against checkers reified from Prop predicates found in the Lean external/export definitions.

Mike Hicks (Oct 23 2024 at 13:06):

It's a good point that this support is useful for more than just testing. It also enables proofs by reflection. You specify something as an inductive relation but then automatically generate a computational version of the same thing, proved equivalent to the original. Then you can switch between the two when doing proofs, whichever is more convenient.

In our Cedar development in Lean, we define type checking as a function, since ultimately we want to run it during testing. But this makes the proof of type soundness harder than it would be if we'd specified typing as an inductive relation instead (at least, that's my guess). Being able to derive the typechecker we have now from an inductive relation would have given us the best of both worlds, AFAICS.

Henrik Böving (Oct 23 2024 at 13:08):

For the specific use case in Cedar you are describing another option is to write the relation inductively and then a function that returns Decidable (HasType Ctx term typ)

Last updated: May 02 2025 at 03:31 UTC