Zulip Chat Archive

Stream: lean4

Topic: LSpec: A testing framework for Lean

Matej Penciak (May 27 2022 at 00:01):

Hello everyone!

@Arthur Paulino and I worked together to build a specification testing framework for Lean, and we're happy to share the first version: https://github.com/yatima-inc/LSpec . Hopefully it'll be helpful to everyone as people start to build bigger projects in Lean 4.

As the name suggests, it's loosely based off the similarly named "Hspec" for Haskell, but Lean's metaprogramming allowed for some fun design changes. Writing and evaluating specifications is hopefully relatively painless thanks to Lean's expressivity:

def foo (n : Nat) : Nat := n

mkspec fooSpec : foo := alwaysEquals foo 4

#spec Test fooSpec with 4 => "Always equals four"
-- Always equals four:
-- ✓ Success!

#spec Tests fooSpec with [3,4,5]
-- × Failure!
-- ✓ Success!
-- × Failure!

The project is still a WIP, and some features on our radar are randomized testing/fuzzing (like Quickcheck for Haskell), and a "test runner" that can hook in as a Lake script to run tests on builds.

Arthur Paulino (May 27 2022 at 00:05):

Please let us know if you have some use cases from HSpec that you really like and we will think how we can integrate it in our package. As always, feedback is very much welcome!

James Gallicchio (May 27 2022 at 04:58):

Do you recommend tests be put in a separate project directory (like Tests) or just littered around a project (i.e. in each relevant directory of the main project?)

Tomas Skrivan (May 27 2022 at 05:42):

It's really great that someone is working on this!

May I ask why do you introduce a new keywords like alwaysEquals? Can't you use decidable Prop?

I was thinking about testing a bit and I was thinking about something like:

mkspec expPositive := forall x : Float, Float.exp x >= 0

As x >= 0 is a decidable Prop you can then generate a test by providing a sampling method for the argument x.

Tomas Skrivan (May 27 2022 at 05:50):

For example, I will be doing testing for equality up to rounding errors. I do not see a reason for the framework to have a special support for that if it allows me to test arbitrary decidable Prop.

Henrik Böving (May 27 2022 at 07:12):

In general it would be nice if going from spec -> actual theorem would be as painless as possible (that includes syntax wise) so people in development can just write tests for their properties and if they decide to verify them can just swap out things quickly and start writing the proof.

Tomas Skrivan (May 27 2022 at 07:17):

What about having a specialized tactic like sorry but that defines a test.

Tomas Skrivan (May 27 2022 at 07:24):

e.g.

theorem expPositive (x : Float) : Float.exp x >= 0 := by
  test
     with x uniform_interval [-10, 10] 1000
     with x [-1, 0, 42]

This will generate two tests verifying the inequality. The first with 1000 samples x drawn with uniform distribution. The second test just runs it with three specific values.

Tomas Skrivan (May 27 2022 at 07:25):

Depending on your compiler/trust settings the tactic will show a similar squiggly line like sorry.

Tomas Skrivan (May 27 2022 at 08:22):

Similarly adding a support for asserts. I'm indexing my arrays with Fin n in order to eliminate all bound checks in my final code. However proving that i<n everywhere takes too much time, so I end up with many sorry for these bound checks. I would like to use a tactic assert that would either perform the runtime check(probably panic on failure) or just turn into sorry based on the assertion level.

Henrik Böving (May 27 2022 at 08:28):

Doing that as a tactic does not seem possible to me, after all proofs in Lean are irrelevant so proof producing terms wont be compiled to executable code.

Tomas Skrivan (May 27 2022 at 08:36):

Ahh you are right, that is a problem. However, we can have a macro theorem ... := test_by ... that generates those tests and optionally the theorem with sorry.

Tomas Skrivan (May 27 2022 at 08:39):

And at some point you can just replace test_by ... with actual proof.

Henrik Böving (May 27 2022 at 08:41):

Oh doing the testing with a tactic should be possible, after all those can be executed at compile time (although we may want to move to executable time for speed), I was more referring to your wish for an assert that you can just use anywhere so e.g. having a have : n < m := by assert somewhere in a do block would be a problem

Although this can probably be solved with a macro as well.

Tomas Skrivan (May 27 2022 at 08:45):

Yes the assert is a problem. In my use cases I'm almost exclusively working with Subtype. So I can have assert function X -> Subtype P that can run the test or just sorry it based on the assertion level.

Tomas Skrivan (May 27 2022 at 08:50):

With the runtime proof errasure, can't we use something like Thunk Prop to ensure the test does not get errased?

Tomas Skrivan (May 27 2022 at 08:51):

The only problem is that Thunk is currently defined only for Type and not Sort

Henrik Böving (May 27 2022 at 08:56):

You can't just trick the Lean compiler into ignoring axiom K like that

structure SortTunk (α : Sort u) where
  fn : Unit → α

def foo := (SortTunk.mk (λ _ => (True.intro)) : SortTunk True).fn ()

#eval foo -- invalid universe level, 0 is not greater than 0

Henrik Böving (May 27 2022 at 08:57):

What you can however do is have a macro assert! h : n > m that desugars into:

if h : n > m:
   <rest of code>
else
   unreachable!

Or something like that.

Kyle Miller (May 27 2022 at 09:22):

def assert {α : Type _} {p : Prop} [Decidable p] [Inhabited α] (f : p → α) : α :=
if h : p then f h else unreachable!

macro "assert!" x:Lean.explicitBinders " => " b:term : term => Lean.expandExplicitBinders `assert x b

#eval assert! (h : 2 < 5) => (⟨2, h⟩ : Fin 5)
-- 2

The assert function returns a Type and not a Sort to make sure the basic error of using this to create proofs doesn't occur, but it's not a perfect check.

Kyle Miller (May 27 2022 at 09:23):

It can take multiple hypotheses as binders:

#eval assert! (h : 2 < 5) (h' : 1 = 2) => (⟨2, h⟩ : Fin 5)
-- 0 (and "unreachable code has been reached")

Matej Penciak (May 27 2022 at 12:34):

James Gallicchio said:

Do you recommend tests be put in a separate project directory (like Tests) or just littered around a project (i.e. in each relevant directory of the main project?)

I think for now it shouldn't make a difference, and whichever works for your project is probably fine. Eventually when we have a test runner up and running, it'll probably look for specific files, but ideally those also can be scattered around the project.

Matej Penciak (May 27 2022 at 12:45):

Tomas Skrivan said:

It's really great that someone is working on this!

May I ask why do you introduce a new keywords like alwaysEquals? Can't you use decidable Prop?

I was thinking about testing a bit and I was thinking about something like:
mkspec expPositive := forall x : Float, Float.exp x >= 0
As x >= 0 is a decidable Prop you can then generate a test by providing a sampling method for the argument x.

The original version I wrote had everything in terms of Prop, but I found half the time writing tests I had to prove decidability and it would take a lot longer than I hoped defining specifications and examples.

I think having different trust settings is an interesting idea though. It's definitely part of the goal for this to help users go from testing specifications to actually formally verifying the property, and that would help a lot.

Henrik Böving (May 27 2022 at 12:49):

Oh and maybe you want to look into integrating it with slimcheck: https://github.com/leanprover-community/mathlib4/tree/master/Mathlib/Testing/SlimCheck which is the Lean quickcheck port, property testing seems to be one of the preferred ways at least in Haskell to run tests.

Arthur Paulino (May 27 2022 at 17:42):

James Gallicchio said:

Do you recommend tests be put in a separate project directory (like Tests) or just littered around a project (i.e. in each relevant directory of the main project?)

More on this: I was highly motivated by Rust's design of having tests spread all over the code. And the way that #spec works makes the lake build command break if a test doesn't succeed fully. The rationale is that incorrect code won't build!

Arthur Paulino (May 27 2022 at 17:45):

So if you see yourself forced to write a partial function you can't prove things about, write some tests below and proceed more tranquil. Make sure the edge cases work as expected, at least

James Gallicchio (May 27 2022 at 19:20):

Sure. I think that keeping tests in a separate file in the same directory hierarchy should work fine for my use cases.

James Gallicchio (May 27 2022 at 19:22):

How does LSpec interact with externs? One thing I'd really love is a spec that ensures a function matches its implementedBy on some provided set of inputs

James Gallicchio (May 27 2022 at 19:23):

(In fact, such a spec would increase my confidence in the correctness of LeanColls by many orders of magnitude)

Arthur Paulino (May 27 2022 at 19:53):

We haven't tested it in many scenarios yet. I'd expect it to work if your logic can run at compilation time.
For the things that can't be tested at compilation time, we want to provide a runner like Matej said. I'm 100% positive that we'll see the LSpec use cases expanding as we need it to do more

Arthur Paulino (Jun 02 2022 at 13:16):

We've made some changes on LSpec and now its use cases have been expanded. The README teaches you how to use it in two different ways:

Via the #lspec command
Via the lspec function, mixed with the lspec binary

It also teaches you how to integrate LSpec in your CI runs like in this example

Henrik Böving (Jun 02 2022 at 14:12):

I'm still not really sold on it, the approach you chose makes refactoring to formal verification seem like a bigger effort to me than it should be, you are locking the user in to your test framework types etc. which makes pulling out test cases you want to verify seem like a more of a hassle than it should be.

Instead of your Rel type why not use a Decidable Prop? That way nobody has to bother with specifically integrating their custom assertions into your thing but instead we all just implement Decidable instances for our things and can use them in programming, testing etc.
E.g. for your List assertions:

abbrev shouldNotBeEmpty (l : List α) := ¬l.isEmpty

#check shouldNotBeEmpty
variable (l : List α)
#synth Decidable (shouldNotBeEmpty l)

Now that tests are living in Prop we can easily refactor to theorem instead of LSpec if we so desire, for example I might have a series of tests:

abbrev shouldBeEmpty (l : List α) := l.isEmpty
abbrev shouldNotBeEmpty (l : List α) := ¬l.isEmpty

def twoTests : LSpec :=
  it "knows list with elements should not be empty" (shouldNotBeEmpty [1])
  it "knows list without elements should be empty" (shouldBeEmpty [])

and now if we refactor to formal verification I would ideally imagine we can just

theorem List.not_empty_of_elements (l r : List α) (x : α) : shouldNotBeEmpty (l ++ [x] ++ r) := by
  cases l <;> cases r <;> simp[isEmpty]

theorem List.nil_is_empty : @shouldBeEmpty α [] := by
  rfl

def twoTests : LSpec :=
  it "knows list with elements should not be empty" List.not_empty_of_elements
  it "knows list without elements should be empty" List.nil_is_empty

and now LSpec will report property 1 and 2 as formally verified (maybe it will even report what property was proven but that would require meta code).

Arthur Paulino (Jun 02 2022 at 14:21):

At first, I wouldn't want to bother with theorem proving If I choose to test things with LSpec. The way I think of LSpec is more similar to traditional unit testing. You can even write tests for partial functions.

Ideally, when using Lean 4, one would have theorems for the things they can and unit tests for the rest

Arthur Paulino (Jun 02 2022 at 14:27):

You can expect the user to implement decidability instances, but I, while using LSpec, don't want to implement those. I'm usually happy with test outcomes that evaluate booleans

Henrik Böving (Jun 02 2022 at 14:28):

But this is exactly the issue, when writing big software systems it is not realistic that you will verify everything from the beginning, what will most likely happen is:

You write the code
You write some LSpec tests to see if you are at least not completely wrong
You transfer the properties you wanna verify from LSpec to theorem, this might also be a very gradual process since verification can take long.

And if LSpec now doesn't bother to be compatible with Prop world you (unintentionally) discourage people from pulling tests out into theorems because they have to translate everything from your Rel type to Prop which is of course doable but rather annoying and if things are annoying people don't tend to do them.

Furthermore e.g. the spec framework in Haskell is not just a unit tester, it also allows you to have property tests with quickcheck etc. It is a mean to write an executable specification for your code. And if you have formally verified parts of that specification you would ideally want them to appear there as well so if somebody is wondering whether a certain property is tested/verfied they can just read the (executable) specification and see that it is indeed the case.

Henrik Böving (Jun 02 2022 at 14:28):

Arthur Paulino said:

You can expect the user to implement decidability instances, but I, while using LSpec, don't want to implement those. I'm usually happy with test outcomes that evaluate booleans

Well but what is the issue with extending to Decidable propeties then? All Booleans can be auto casted into Decidable Props

Arthur Paulino (Jun 02 2022 at 14:37):

I'm starting to see what you mean. That sounds like a nice next step. I suspect the challenge will be upgrading the #lspec command along with everything else

Tomas Skrivan (Jun 02 2022 at 16:54):

From my experience, I'm more and more convinced that using sorry should be much more encouraged and not make you feel guilty when you use it, at least for general programming. I find dependent types super powerful and useful not for proving stuff but that I can clearly state the specification of what my program is supposed to be doing.

I commonly implement decidable equality by effectively writing a boolean function and just use sorry for those equality proofs.

I think such practice should be encouraged and filling out those proofs is just an another way you can debug your code.

Tomas Skrivan (Jun 02 2022 at 17:00):

So I agree with Henrik that tests should live in Prop and I would add that it should be encouraged to implement decidable prop just as a boolean function with sorry. Full proofs can be filled in later when things start breaking.

Kyle Miller (Jun 02 2022 at 17:24):

I'm curious, what issues are you running into with writing decidable instances? Decidable is "just" Bool that also knows what it's saying is true or false.

You could manipulate Decidable values as if they were boolean values by creating some additional functions that are already present as instances:

def Decidable.and (hp : Decidable p) (hq : Decidable q) : Decidable (p ∧ q) :=
inferInstance

(It's admittedly trickier to work with Decidable than Bool since you have that dependent type.)

Arthur Paulino (Jun 02 2022 at 17:29):

The "issue" is mostly time, I believe. We were aiming at something minimal enough to serve our purposes. But of course it can be expanded/improved and more use cases can be implemented if/when needed

Kyle Miller (Jun 02 2022 at 17:31):

Arthur Paulino said:

You can expect the user to implement decidability instances, but I, while using LSpec, don't want to implement those. I'm usually happy with test outcomes that evaluate booleans

I think the point is that Decidable lets you use Prop as a DSL for boolean-valued functions, so that way you/users can use familiar syntax.

Arthur Paulino (Jun 02 2022 at 17:38):

Yeah, it's definitely worth a try at some point! If anything, it's always possible to offer a simpler API that just deals with Bools and another that handles Prop

Kyle Miller (Jun 02 2022 at 18:06):

@Arthur Paulino Here's what it could look like using Decidable:

import Lean

/--
The basic structure used to encode specifications.
It represents a list of tests, with each test consisting of
a description `descr`, some value `a`, and a test `rel` for `a`.
-/
inductive LSpec
  | done
  | more (descr : String) (a : α) (rel : α → Prop) (h : DecidablePred rel) (next : LSpec)
  deriving Inhabited

/--
`it` adds a single test to the testing suite.
This is the same as the projection `LSpec.more`, but extracting it out
allows us to set `next` with the default value `done`.
-/
def it (descr : String) (a : α) (rel : α → Prop) [DecidablePred rel] (next : LSpec := LSpec.done) :=
  LSpec.more descr a rel inferInstance next

/--
Prints the result of a test.
-/
def printMsg : String × Bool → String
  | (d, true)  => s!"✓ it {d}"
  | (d, false) => s!"× it {d}"

/--
Runs a set of `LSpec` tests and appends the results to
another list of results (given as input by the caller).
-/
def runLSpec (results : List (String × Bool)) : LSpec → List (String × Bool)
  | .more d a rel h next => (d, rel a) :: (runLSpec results next)
  | .done                => results.reverse

/--
Generates a report for all the results in a `LSpec` test,
returning `true` if all tests passed and `false` otherwise.
-/
def compileLSpecResult (t : LSpec) : Bool × String :=
  let res := runLSpec [] t
  (res.foldl (init := true) fun acc (_, r) => acc && r,
    "\n".intercalate $ res.map printMsg)

/--
Runs a `LSpec` test and pretty prints the results.
-/
def lspec (s : String) (t : LSpec) (_ : List String) : IO UInt32 := do
  IO.println s!"Testing that: {s}"
  let (success?, msg) := compileLSpecResult t
  if success?
    then IO.println  msg; return 0
    else IO.eprintln msg; return 1

syntax (name := lspecStx) "#lspec " term : command

open Lean Meta Elab Command Term in
/--
Custom elaborator for the `#lspec` command.
Basically it runs the normal `lspec` function and outputs to the Infoview.
-/
@[commandElab lspecStx] unsafe def elabLSpec : CommandElab := fun stx => do
  let term := stx[1]
  liftTermElabM `lspec do
    let term ← elabTerm term none
    synthesizeSyntheticMVarsNoPostponing
    let type ← inferType term
    if type.isAppOf ``LSpec then
      let res ← reduce $ ← mkAppM ``compileLSpecResult #[term]
      let success? : Bool ← evalExpr Bool ``Bool (res.getArg! 2)
      let msg : String ← evalExpr String ``String (res.getArg! 3)
      if success?
        then logInfo msg
        else throwError msg
    else throwError "Invalid term to run '#lspec' with"

def foo : LSpec := it "knows equality" 4 (λ x => x = 4)

#lspec foo

def twoTests : LSpec :=
  it "knows equality" 42 (λ x => x = 42) $
  it "knows lists" [42].length (λ x => x = 1)

def main :=
  lspec "some description" $
    twoTests

#eval main []

For example,

def foo : LSpec := it "knows equality" 4 (λ x => x = 4)

Kyle Miller (Jun 02 2022 at 18:07):

You can use boolean-valued functions too with this:

def foo : LSpec := it "knows equality" 4 (λ x => x == 4)

Arthur Paulino (Jun 02 2022 at 18:10):

Oh that was (a lot) smoother than I thought actually! Thanks!

Tomas Skrivan (Jun 02 2022 at 20:20):

Here is my idea how to write tests while using familiar theorem ... := <proof> syntax. Instead of <proof> you write test_by <test instructions>. It looks like this:

def foo (n : Nat) := 2 * n
def bar (n : Nat) := 2 * n + 1

-- All succeed
theorem test_foo (n : Nat) : foo n % 2 = 0 := test_by
  (n : Nat := 0)
  (n : Nat := 1)
  (n : Nat := 42)

-- All fail
theorem test_bar (n : Nat) : bar n % 2 = 0 := test_by
  (n : Nat := 0)
  (n : Nat := 1)
  (n : Nat := 42)

-- One fail, two succeed
theorem test (n : Nat) : n % 2 = 0 := test_by
  (n : Nat := 0)
  (n : Nat := 1)
  (n : Nat := 4)

It is just a proof of concept, the macro is super dumb.

The full code:

declare_syntax_cat testStx (behavior := both)

syntax "(" ident ":" term ":=" term ")" : testStx

syntax declModifiers "theorem " declId bracketedBinder* ":" term ":=" "test_by" testStx+ : command

def run_test (test : Bool) (message : String) : IO Unit :=
  if test then
    IO.println s!"Test passed! {message}"
  else
    throw (IO.userError s!"Test failed! {message}")

macro_rules
  | `($mods:declModifiers theorem $id:declId $params:bracketedBinder* : $thrm:term := test_by ( $x:ident : $X:term := $val:term ) ) =>
    let xname := Lean.Syntax.mkStrLit x.getId.toString
    let idname := Lean.Syntax.mkStrLit id[0].getId.toString
    `(#eval run_test (let $x : $X := $val; $thrm) s!"{$idname} with {$xname} := {$val}")

  | `($mods:declModifiers theorem $id:declId $params:bracketedBinder* : $thrm:term := test_by $test $tests*) =>
    `($mods:declModifiers theorem $id:declId $params:bracketedBinder* : $thrm:term := test_by $test
      $mods:declModifiers theorem $id:declId $params:bracketedBinder* : $thrm:term := test_by $tests*)


def foo (n : Nat) := 2 * n
def bar (n : Nat) := 2 * n + 1

-- All succeed
theorem test_foo (n : Nat) : foo n % 2 = 0 := test_by
  (n : Nat := 0)
  (n : Nat := 1)
  (n : Nat := 42)

-- All fail
theorem test_bar (n : Nat) : bar n % 2 = 0 := test_by
  (n : Nat := 0)
  (n : Nat := 1)
  (n : Nat := 42)

-- One fail, two succeed
theorem test (n : Nat) : n % 2 = 0 := test_by
  (n : Nat := 0)
  (n : Nat := 1)
  (n : Nat := 4)

Arthur Paulino (Jun 02 2022 at 20:36):

This looks cool!

Tomas Skrivan (Jun 03 2022 at 09:58):

Another thought, the code above generates #eval commands. Thus the test runs at compile time, this is undesirable for more heavy weight tests. It would be nice to alternatively generate IO Unit instead of #eval that can be executed after compilation is done.

Arthur Paulino (Jun 23 2022 at 15:18):

After great help from @Kyle Miller, we've upgraded LSpec to allow test statements to live in Prop. The evaluation was also improved, although it's still an #eval under the hood.

https://github.com/yatima-inc/LSpec

Kyle Miller (Jun 23 2022 at 16:12):

Something this is also able to do is give an explanation for why a test failed:

#lspec test "array eq" <| #[1,2,3] = (List.range 3).toArray
/-
× array eq
  Expected to be equal:
    #[1, 2, 3]
  and
    #[0, 1, 2]
-/

This doesn't require any metaprogramming beyond typeclasses. The trick here is to define a custom version of Decidable that includes a message for why it's false: https://github.com/yatima-inc/LSpec/blob/main/LSpec/LSpec.lean#L7

This particular example only depends on an instance for equality on a type that implements Repr and DecidableEq: https://github.com/yatima-inc/LSpec/blob/main/LSpec/Instances.lean#L6

Henrik Böving (Jun 23 2022 at 16:16):

One thing you could add to ease getting started is a prio := low instance for TDecidable that uses a Decidable as its basis and produces some totally generic error message. Apart from that it's looking quite nice to me now!

Kyle Miller (Jun 23 2022 at 16:34):

@Henrik Böving Don't you worry, it's already there: https://github.com/yatima-inc/LSpec/blob/main/LSpec/LSpec.lean#L11

Henrik Böving (Jun 23 2022 at 16:34):

Ah, I only looked in Instances.lean

Arthur Paulino (Jun 23 2022 at 16:53):

Maybe it should to go to Instances.lean actually. Will do in a next update. I think the next big improvement is implementing Tomas' idea of replacing the #eval call. But I won't be able to work on it for a while

Tomas Skrivan (Jun 23 2022 at 18:27):

Looks nice, I'm really glad you went for the Prop based tests!

Are you also planning to add tests for tactics? All tests in my project are just testing if a tactic can solve a goal or not. It would be nice to get the same output for these kinds of tests. Currently I just check if the lean compiler successfully compiles a file with tests or not.

Arthur Paulino (Jun 23 2022 at 19:33):

That's certainly a nice upgrade too. Maybe monadic testing first, then tactics? idk

Arthur Paulino (Jul 12 2022 at 13:42):

Hi everyone, we've released a new version of LSpec!

It's compatible with the most updated Lake release and makes use of its capabilities to run tests on properly compiled binaries rather than relying on running Lean code as script.

Besides that major upgrade, we've also improved the API. We now provide two different lspec and a lspecIO functions. The former is meant for general purpose testing inside any function and the later is meant to be used as the body of a main function of a Lean file to be compiled by Lake.

Last updated: May 02 2025 at 03:31 UTC