Zulip Chat Archive
Stream: lean4
Topic: LSpec: A testing framework for Lean
Matej Penciak (May 27 2022 at 00:01):
Hello everyone!
@Arthur Paulino and I worked together to build a specification testing framework for Lean, and we're happy to share the first version: https://github.com/yatima-inc/LSpec . Hopefully it'll be helpful to everyone as people start to build bigger projects in Lean 4.
As the name suggests, it's loosely based off the similarly named "Hspec" for Haskell, but Lean's metaprogramming allowed for some fun design changes. Writing and evaluating specifications is hopefully relatively painless thanks to Lean's expressivity:
def foo (n : Nat) : Nat := n
mkspec fooSpec : foo := alwaysEquals foo 4
#spec Test fooSpec with 4 => "Always equals four"
-- Always equals four:
-- ✓ Success!
#spec Tests fooSpec with [3,4,5]
-- × Failure!
-- ✓ Success!
-- × Failure!
The project is still a WIP, and some features on our radar are randomized testing/fuzzing (like Quickcheck for Haskell), and a "test runner" that can hook in as a Lake script to run tests on builds.
Arthur Paulino (May 27 2022 at 00:05):
Please let us know if you have some use cases from HSpec that you really like and we will think how we can integrate it in our package. As always, feedback is very much welcome!
James Gallicchio (May 27 2022 at 04:58):
Do you recommend tests be put in a separate project directory (like Tests
) or just littered around a project (i.e. in each relevant directory of the main project?)
Tomas Skrivan (May 27 2022 at 05:42):
It's really great that someone is working on this!
May I ask why do you introduce a new keywords like alwaysEquals
? Can't you use decidable Prop?
I was thinking about testing a bit and I was thinking about something like:
mkspec expPositive := forall x : Float, Float.exp x >= 0
As x >= 0
is a decidable Prop you can then generate a test by providing a sampling method for the argument x
.
Tomas Skrivan (May 27 2022 at 05:50):
For example, I will be doing testing for equality up to rounding errors. I do not see a reason for the framework to have a special support for that if it allows me to test arbitrary decidable Prop.
Henrik Böving (May 27 2022 at 07:12):
In general it would be nice if going from spec -> actual theorem would be as painless as possible (that includes syntax wise) so people in development can just write tests for their properties and if they decide to verify them can just swap out things quickly and start writing the proof.
Tomas Skrivan (May 27 2022 at 07:17):
What about having a specialized tactic like sorry
but that defines a test.
Tomas Skrivan (May 27 2022 at 07:24):
e.g.
theorem expPositive (x : Float) : Float.exp x >= 0 := by
test
with x uniform_interval [-10, 10] 1000
with x [-1, 0, 42]
This will generate two tests verifying the inequality. The first with 1000 samples x
drawn with uniform distribution. The second test just runs it with three specific values.
Tomas Skrivan (May 27 2022 at 07:25):
Depending on your compiler/trust settings the tactic will show a similar squiggly line like sorry
.
Tomas Skrivan (May 27 2022 at 08:22):
Similarly adding a support for asserts. I'm indexing my arrays with Fin n
in order to eliminate all bound checks in my final code. However proving that i<n
everywhere takes too much time, so I end up with many sorry
for these bound checks. I would like to use a tactic assert
that would either perform the runtime check(probably panic on failure) or just turn into sorry based on the assertion level.
Henrik Böving (May 27 2022 at 08:28):
Doing that as a tactic does not seem possible to me, after all proofs in Lean are irrelevant so proof producing terms wont be compiled to executable code.
Tomas Skrivan (May 27 2022 at 08:36):
Ahh you are right, that is a problem. However, we can have a macro theorem ... := test_by ...
that generates those tests and optionally the theorem with sorry.
Tomas Skrivan (May 27 2022 at 08:39):
And at some point you can just replace test_by ...
with actual proof.
Henrik Böving (May 27 2022 at 08:41):
Oh doing the testing with a tactic should be possible, after all those can be executed at compile time (although we may want to move to executable time for speed), I was more referring to your wish for an assert
that you can just use anywhere so e.g. having a have : n < m := by assert
somewhere in a do
block would be a problem
Although this can probably be solved with a macro as well.
Tomas Skrivan (May 27 2022 at 08:45):
Yes the assert is a problem. In my use cases I'm almost exclusively working with Subtype. So I can have assert function X -> Subtype P
that can run the test or just sorry it based on the assertion level.
Tomas Skrivan (May 27 2022 at 08:50):
With the runtime proof errasure, can't we use something like Thunk Prop
to ensure the test does not get errased?
Tomas Skrivan (May 27 2022 at 08:51):
The only problem is that Thunk
is currently defined only for Type
and not Sort
Henrik Böving (May 27 2022 at 08:56):
You can't just trick the Lean compiler into ignoring axiom K like that
structure SortTunk (α : Sort u) where
fn : Unit → α
def foo := (SortTunk.mk (λ _ => (True.intro)) : SortTunk True).fn ()
#eval foo -- invalid universe level, 0 is not greater than 0
Henrik Böving (May 27 2022 at 08:57):
What you can however do is have a macro assert! h : n > m
that desugars into:
if h : n > m:
<rest of code>
else
unreachable!
Or something like that.
Kyle Miller (May 27 2022 at 09:22):
def assert {α : Type _} {p : Prop} [Decidable p] [Inhabited α] (f : p → α) : α :=
if h : p then f h else unreachable!
macro "assert!" x:Lean.explicitBinders " => " b:term : term => Lean.expandExplicitBinders `assert x b
#eval assert! (h : 2 < 5) => (⟨2, h⟩ : Fin 5)
-- 2
The assert
function returns a Type
and not a Sort
to make sure the basic error of using this to create proofs doesn't occur, but it's not a perfect check.
Kyle Miller (May 27 2022 at 09:23):
It can take multiple hypotheses as binders:
#eval assert! (h : 2 < 5) (h' : 1 = 2) => (⟨2, h⟩ : Fin 5)
-- 0 (and "unreachable code has been reached")
Matej Penciak (May 27 2022 at 12:34):
James Gallicchio said:
Do you recommend tests be put in a separate project directory (like
Tests
) or just littered around a project (i.e. in each relevant directory of the main project?)
I think for now it shouldn't make a difference, and whichever works for your project is probably fine. Eventually when we have a test runner up and running, it'll probably look for specific files, but ideally those also can be scattered around the project.
Matej Penciak (May 27 2022 at 12:45):
Tomas Skrivan said:
It's really great that someone is working on this!
May I ask why do you introduce a new keywords like
alwaysEquals
? Can't you use decidable Prop?I was thinking about testing a bit and I was thinking about something like:
mkspec expPositive := forall x : Float, Float.exp x >= 0
As
x >= 0
is a decidable Prop you can then generate a test by providing a sampling method for the argumentx
.
The original version I wrote had everything in terms of Prop
, but I found half the time writing tests I had to prove decidability and it would take a lot longer than I hoped defining specifications and examples.
I think having different trust settings is an interesting idea though. It's definitely part of the goal for this to help users go from testing specifications to actually formally verifying the property, and that would help a lot.
Henrik Böving (May 27 2022 at 12:49):
Oh and maybe you want to look into integrating it with slimcheck: https://github.com/leanprover-community/mathlib4/tree/master/Mathlib/Testing/SlimCheck which is the Lean quickcheck port, property testing seems to be one of the preferred ways at least in Haskell to run tests.
Arthur Paulino (May 27 2022 at 17:42):
James Gallicchio said:
Do you recommend tests be put in a separate project directory (like
Tests
) or just littered around a project (i.e. in each relevant directory of the main project?)
More on this: I was highly motivated by Rust's design of having tests spread all over the code. And the way that #spec
works makes the lake build
command break if a test doesn't succeed fully. The rationale is that incorrect code won't build!
Arthur Paulino (May 27 2022 at 17:45):
So if you see yourself forced to write a partial
function you can't prove things about, write some tests below and proceed more tranquil. Make sure the edge cases work as expected, at least
James Gallicchio (May 27 2022 at 19:20):
Sure. I think that keeping tests in a separate file in the same directory hierarchy should work fine for my use cases.
James Gallicchio (May 27 2022 at 19:22):
How does LSpec interact with externs? One thing I'd really love is a spec that ensures a function matches its implementedBy
on some provided set of inputs
James Gallicchio (May 27 2022 at 19:23):
(In fact, such a spec would increase my confidence in the correctness of LeanColls by many orders of magnitude)
Arthur Paulino (May 27 2022 at 19:53):
We haven't tested it in many scenarios yet. I'd expect it to work if your logic can run at compilation time.
For the things that can't be tested at compilation time, we want to provide a runner like Matej said. I'm 100% positive that we'll see the LSpec use cases expanding as we need it to do more
Arthur Paulino (Jun 02 2022 at 13:16):
We've made some changes on LSpec and now its use cases have been expanded. The README teaches you how to use it in two different ways:
- Via the
#lspec
command - Via the
lspec
function, mixed with thelspec
binary
It also teaches you how to integrate LSpec in your CI runs like in this example
Henrik Böving (Jun 02 2022 at 14:12):
I'm still not really sold on it, the approach you chose makes refactoring to formal verification seem like a bigger effort to me than it should be, you are locking the user in to your test framework types etc. which makes pulling out test cases you want to verify seem like a more of a hassle than it should be.
Instead of your Rel
type why not use a Decidable
Prop
? That way nobody has to bother with specifically integrating their custom assertions into your thing but instead we all just implement Decidable
instances for our things and can use them in programming, testing etc.
E.g. for your List assertions:
abbrev shouldNotBeEmpty (l : List α) := ¬l.isEmpty
#check shouldNotBeEmpty
variable (l : List α)
#synth Decidable (shouldNotBeEmpty l)
Now that tests are living in Prop
we can easily refactor to theorem
instead of LSpec if we so desire, for example I might have a series of tests:
abbrev shouldBeEmpty (l : List α) := l.isEmpty
abbrev shouldNotBeEmpty (l : List α) := ¬l.isEmpty
def twoTests : LSpec :=
it "knows list with elements should not be empty" (shouldNotBeEmpty [1])
it "knows list without elements should be empty" (shouldBeEmpty [])
and now if we refactor to formal verification I would ideally imagine we can just
theorem List.not_empty_of_elements (l r : List α) (x : α) : shouldNotBeEmpty (l ++ [x] ++ r) := by
cases l <;> cases r <;> simp[isEmpty]
theorem List.nil_is_empty : @shouldBeEmpty α [] := by
rfl
def twoTests : LSpec :=
it "knows list with elements should not be empty" List.not_empty_of_elements
it "knows list without elements should be empty" List.nil_is_empty
and now LSpec will report property 1 and 2 as formally verified (maybe it will even report what property was proven but that would require meta code).
Arthur Paulino (Jun 02 2022 at 14:21):
At first, I wouldn't want to bother with theorem proving If I choose to test things with LSpec. The way I think of LSpec is more similar to traditional unit testing. You can even write tests for partial functions.
Ideally, when using Lean 4, one would have theorems for the things they can and unit tests for the rest
Arthur Paulino (Jun 02 2022 at 14:27):
You can expect the user to implement decidability instances, but I, while using LSpec, don't want to implement those. I'm usually happy with test outcomes that evaluate booleans
Henrik Böving (Jun 02 2022 at 14:28):
But this is exactly the issue, when writing big software systems it is not realistic that you will verify everything from the beginning, what will most likely happen is:
- You write the code
- You write some LSpec tests to see if you are at least not completely wrong
- You transfer the properties you wanna verify from LSpec to theorem, this might also be a very gradual process since verification can take long.
And if LSpec now doesn't bother to be compatible with Prop
world you (unintentionally) discourage people from pulling tests out into theorems because they have to translate everything from your Rel
type to Prop
which is of course doable but rather annoying and if things are annoying people don't tend to do them.
Furthermore e.g. the spec framework in Haskell is not just a unit tester, it also allows you to have property tests with quickcheck
etc. It is a mean to write an executable specification for your code. And if you have formally verified parts of that specification you would ideally want them to appear there as well so if somebody is wondering whether a certain property is tested/verfied they can just read the (executable) specification and see that it is indeed the case.
Henrik Böving (Jun 02 2022 at 14:28):
Arthur Paulino said:
You can expect the user to implement decidability instances, but I, while using LSpec, don't want to implement those. I'm usually happy with test outcomes that evaluate booleans
Well but what is the issue with extending to Decidable propeties then? All Booleans can be auto casted into Decidable Props
Arthur Paulino (Jun 02 2022 at 14:37):
I'm starting to see what you mean. That sounds like a nice next step. I suspect the challenge will be upgrading the #lspec
command along with everything else
Tomas Skrivan (Jun 02 2022 at 16:54):
From my experience, I'm more and more convinced that using sorry
should be much more encouraged and not make you feel guilty when you use it, at least for general programming. I find dependent types super powerful and useful not for proving stuff but that I can clearly state the specification of what my program is supposed to be doing.
I commonly implement decidable equality by effectively writing a boolean function and just use sorry for those equality proofs.
I think such practice should be encouraged and filling out those proofs is just an another way you can debug your code.
Tomas Skrivan (Jun 02 2022 at 17:00):
So I agree with Henrik that tests should live in Prop and I would add that it should be encouraged to implement decidable prop just as a boolean function with sorry. Full proofs can be filled in later when things start breaking.
Kyle Miller (Jun 02 2022 at 17:24):
I'm curious, what issues are you running into with writing decidable instances? Decidable
is "just" Bool
that also knows what it's saying is true or false.
You could manipulate Decidable
values as if they were boolean values by creating some additional functions that are already present as instances:
def Decidable.and (hp : Decidable p) (hq : Decidable q) : Decidable (p ∧ q) :=
inferInstance
(It's admittedly trickier to work with Decidable
than Bool
since you have that dependent type.)
Arthur Paulino (Jun 02 2022 at 17:29):
The "issue" is mostly time, I believe. We were aiming at something minimal enough to serve our purposes. But of course it can be expanded/improved and more use cases can be implemented if/when needed
Kyle Miller (Jun 02 2022 at 17:31):
Arthur Paulino said:
You can expect the user to implement decidability instances, but I, while using LSpec, don't want to implement those. I'm usually happy with test outcomes that evaluate booleans
I think the point is that Decidable
lets you use Prop
as a DSL for boolean-valued functions, so that way you/users can use familiar syntax.
Arthur Paulino (Jun 02 2022 at 17:38):
Yeah, it's definitely worth a try at some point! If anything, it's always possible to offer a simpler API that just deals with Bools and another that handles Prop
Kyle Miller (Jun 02 2022 at 18:06):
@Arthur Paulino Here's what it could look like using Decidable
:
For example,
def foo : LSpec := it "knows equality" 4 (λ x => x = 4)
Kyle Miller (Jun 02 2022 at 18:07):
You can use boolean-valued functions too with this:
def foo : LSpec := it "knows equality" 4 (λ x => x == 4)
Arthur Paulino (Jun 02 2022 at 18:10):
Oh that was (a lot) smoother than I thought actually! Thanks!
Tomas Skrivan (Jun 02 2022 at 20:20):
Here is my idea how to write tests while using familiar theorem ... := <proof>
syntax. Instead of <proof>
you write test_by <test instructions>
. It looks like this:
def foo (n : Nat) := 2 * n
def bar (n : Nat) := 2 * n + 1
-- All succeed
theorem test_foo (n : Nat) : foo n % 2 = 0 := test_by
(n : Nat := 0)
(n : Nat := 1)
(n : Nat := 42)
-- All fail
theorem test_bar (n : Nat) : bar n % 2 = 0 := test_by
(n : Nat := 0)
(n : Nat := 1)
(n : Nat := 42)
-- One fail, two succeed
theorem test (n : Nat) : n % 2 = 0 := test_by
(n : Nat := 0)
(n : Nat := 1)
(n : Nat := 4)
It is just a proof of concept, the macro is super dumb.
The full code:
declare_syntax_cat testStx (behavior := both)
syntax "(" ident ":" term ":=" term ")" : testStx
syntax declModifiers "theorem " declId bracketedBinder* ":" term ":=" "test_by" testStx+ : command
def run_test (test : Bool) (message : String) : IO Unit :=
if test then
IO.println s!"Test passed! {message}"
else
throw (IO.userError s!"Test failed! {message}")
macro_rules
| `($mods:declModifiers theorem $id:declId $params:bracketedBinder* : $thrm:term := test_by ( $x:ident : $X:term := $val:term ) ) =>
let xname := Lean.Syntax.mkStrLit x.getId.toString
let idname := Lean.Syntax.mkStrLit id[0].getId.toString
`(#eval run_test (let $x : $X := $val; $thrm) s!"{$idname} with {$xname} := {$val}")
| `($mods:declModifiers theorem $id:declId $params:bracketedBinder* : $thrm:term := test_by $test $tests*) =>
`($mods:declModifiers theorem $id:declId $params:bracketedBinder* : $thrm:term := test_by $test
$mods:declModifiers theorem $id:declId $params:bracketedBinder* : $thrm:term := test_by $tests*)
def foo (n : Nat) := 2 * n
def bar (n : Nat) := 2 * n + 1
-- All succeed
theorem test_foo (n : Nat) : foo n % 2 = 0 := test_by
(n : Nat := 0)
(n : Nat := 1)
(n : Nat := 42)
-- All fail
theorem test_bar (n : Nat) : bar n % 2 = 0 := test_by
(n : Nat := 0)
(n : Nat := 1)
(n : Nat := 42)
-- One fail, two succeed
theorem test (n : Nat) : n % 2 = 0 := test_by
(n : Nat := 0)
(n : Nat := 1)
(n : Nat := 4)
Arthur Paulino (Jun 02 2022 at 20:36):
This looks cool!
Tomas Skrivan (Jun 03 2022 at 09:58):
Another thought, the code above generates #eval
commands. Thus the test runs at compile time, this is undesirable for more heavy weight tests. It would be nice to alternatively generate IO Unit
instead of #eval
that can be executed after compilation is done.
Arthur Paulino (Jun 23 2022 at 15:18):
After great help from @Kyle Miller, we've upgraded LSpec to allow test statements to live in Prop
. The evaluation was also improved, although it's still an #eval
under the hood.
https://github.com/yatima-inc/LSpec
Kyle Miller (Jun 23 2022 at 16:12):
Something this is also able to do is give an explanation for why a test failed:
#lspec test "array eq" <| #[1,2,3] = (List.range 3).toArray
/-
× array eq
Expected to be equal:
#[1, 2, 3]
and
#[0, 1, 2]
-/
This doesn't require any metaprogramming beyond typeclasses. The trick here is to define a custom version of Decidable
that includes a message for why it's false: https://github.com/yatima-inc/LSpec/blob/main/LSpec/LSpec.lean#L7
This particular example only depends on an instance for equality on a type that implements Repr
and DecidableEq
: https://github.com/yatima-inc/LSpec/blob/main/LSpec/Instances.lean#L6
Henrik Böving (Jun 23 2022 at 16:16):
One thing you could add to ease getting started is a prio := low
instance for TDecidable
that uses a Decidable
as its basis and produces some totally generic error message. Apart from that it's looking quite nice to me now!
Kyle Miller (Jun 23 2022 at 16:34):
@Henrik Böving Don't you worry, it's already there: https://github.com/yatima-inc/LSpec/blob/main/LSpec/LSpec.lean#L11
Henrik Böving (Jun 23 2022 at 16:34):
Ah, I only looked in Instances.lean
Arthur Paulino (Jun 23 2022 at 16:53):
Maybe it should to go to Instances.lean
actually. Will do in a next update. I think the next big improvement is implementing Tomas' idea of replacing the #eval
call. But I won't be able to work on it for a while
Tomas Skrivan (Jun 23 2022 at 18:27):
Looks nice, I'm really glad you went for the Prop based tests!
Are you also planning to add tests for tactics? All tests in my project are just testing if a tactic can solve a goal or not. It would be nice to get the same output for these kinds of tests. Currently I just check if the lean compiler successfully compiles a file with tests or not.
Arthur Paulino (Jun 23 2022 at 19:33):
That's certainly a nice upgrade too. Maybe monadic testing first, then tactics? idk
Arthur Paulino (Jul 12 2022 at 13:42):
Hi everyone, we've released a new version of LSpec!
It's compatible with the most updated Lake release and makes use of its capabilities to run tests on properly compiled binaries rather than relying on running Lean code as script.
Besides that major upgrade, we've also improved the API. We now provide two different lspec
and a lspecIO
functions. The former is meant for general purpose testing inside any function and the later is meant to be used as the body of a main
function of a Lean file to be compiled by Lake.
Last updated: Dec 20 2023 at 11:08 UTC