Zulip Chat Archive

Stream: general

Topic: data-driven unit tests

Nathan Glenn (Jan 22 2023 at 13:57):

Hi, I'm working through the "Logical Verification" course materials from VU. I'm working on the first exercises and came upon this section: https://github.com/blanchette/logical_verification_2022/blob/main/lean/love01_definitions_and_statements_exercise_sheet.lean#L29
11 #check statements with instructions to mouse over each and check that the value is correct. Shopping around for other examples, I found the idea of changing each of the #check statements to a lemma proven using refl. So I can test the first line automatically like so:
lemma test_sub00: sub 0 0 = 0 := rfl

Nathan Glenn (Jan 22 2023 at 13:58):

Programmer that I am, I want to be able to put all of the test data into its own structure.

Nathan Glenn (Jan 22 2023 at 13:59):

The idea behind data-driven testing is that you have one piece of testing logic that gets repeatedly applied to some list of cases. This gives you fewer points of potential bugs because you're not copy-pasting logic around.

Nathan Glenn (Jan 22 2023 at 14:00):

I'm not sure how to proceed here, though; I moved all of the test cases into a list like so:

def test_data : list (list ℕ) :=
[[0, 0, 0], [0, 1, 0], [0, 7, 0], [1, 0, 1], [1, 1, 0], [3, 0, 3],
 [2, 7, 0], [3, 1, 2], [3, 3, 0], [3, 7, 0], [7, 2, 5]]

Mario Carneiro (Jan 22 2023 at 14:00):

Note that those two are not the same. a lemma proved by rfl is evaluated by the kernel, while #eval evaluates in the interpreter/compiler. The latter is unverified but often much faster or even may succeed where the other fails

Mario Carneiro (Jan 22 2023 at 14:01):

There isn't any particular reason to organize the tests beyond having a bunch of lemmas because lemma correctness is evaluated at compile time

Nathan Glenn (Jan 22 2023 at 14:01):

Interesting. I guess that makes #eval doubly dangerous as a unit-test drop-in then, right?

Mario Carneiro (Jan 22 2023 at 14:01):

it depends on what you want to test

Mario Carneiro (Jan 22 2023 at 14:02):

you can turn the eval into a test by evaluating something like #eval guard (sub 0 0 = 0)

Mario Carneiro (Jan 22 2023 at 14:03):

I forget the exact syntax but there are ways to set it up so that the eval line itself fails if the test is wrong, and this is still being evaluated at compile time

Nathan Glenn (Jan 22 2023 at 14:03):

I don't see guard in the manual

Nathan Glenn (Jan 22 2023 at 14:04):

and it's giving me don't know how to synthesize placeholder context: ⊢ Type → Type ?

Nathan Glenn (Jan 22 2023 at 14:04):

for #eval guard (sub 0 0 = 0)

Mario Carneiro (Jan 22 2023 at 14:04):

try (guard ... : tactic unit)

Mario Carneiro (Jan 22 2023 at 14:04):

I don't think the manual has an exhaustive list of functions, you are more likely to find it in docs#guard

Nathan Glenn (Jan 22 2023 at 14:05):

#eval (guard (sub 0 0 = 0): tactic unit) works :) Thanks. So guard is like an assert statement.

Mario Carneiro (Jan 22 2023 at 14:06):

yeah

Nathan Glenn (Jan 22 2023 at 14:06):

Okay, so back to the list-based one...

Mario Carneiro (Jan 22 2023 at 14:06):

this is like static_assert in C++, not sure what your background is

Nathan Glenn (Jan 22 2023 at 14:06):

Cool, makes sense

Mario Carneiro (Jan 22 2023 at 14:06):

I'm not sure what the numbers represent

Nathan Glenn (Jan 22 2023 at 14:06):

I'm really brand new at lean

Nathan Glenn (Jan 22 2023 at 14:07):

Right, so the idea is that I want to take the arguments and expected output for each test case and put them together ina collection

Nathan Glenn (Jan 22 2023 at 14:07):

I don't really like list for this, but it was the only thing I could get to compile

Nathan Glenn (Jan 22 2023 at 14:08):

I would rather have tuples of a pre-defined length, or an object/struct with named fields

Nathan Glenn (Jan 22 2023 at 14:09):

But the idea here is that I want to do something like this:

for case in test_data
    assert sub case[0] case[1] = case[2]

Mario Carneiro (Jan 22 2023 at 14:10):

I think this would work:

def test_data : list (ℕ × ℕ × ℕ) :=
[(0, 0, 0), (0, 1, 0), (0, 7, 0), (1, 0, 1), (1, 1, 0), (3, 0, 3),
 (2, 7, 0), (3, 1, 2), (3, 3, 0), (3, 7, 0), (7, 2, 5)]

#eval show tactic unit, from
  test_data.mmap' (λ ⟨a, b, c⟩, guard (sub a b = c))

Mario Carneiro (Jan 22 2023 at 14:10):

untested as I don't have a lean 3 on hand

Nathan Glenn (Jan 22 2023 at 14:11):

Ah, that's how you make tuples!

Nathan Glenn (Jan 22 2023 at 14:11):

intuitive. Just couldn't find it by searching

Mario Carneiro (Jan 22 2023 at 14:11):

the underlying function this is notation for is docs#prod.mk

Nathan Glenn (Jan 22 2023 at 14:12):

Cool, this works. Thank you!

Nathan Glenn (Jan 22 2023 at 14:12):

There is one issue...

Nathan Glenn (Jan 22 2023 at 14:12):

Not sure if it can be fixed

Nathan Glenn (Jan 22 2023 at 14:12):

but if you change one of the test cases so that it fails

Nathan Glenn (Jan 22 2023 at 14:13):

Lean just underlines the #eval and says "fail". No info on which case failed or how it failed.

Mario Carneiro (Jan 22 2023 at 14:14):

#eval show tactic unit, from
  test_data.mmap' (λ ⟨a, b, c⟩,
    guard (sub a b = c) <|> fail format!"sub {a} {b} wasn't {c}, it was {sub a b}")

Nathan Glenn (Jan 22 2023 at 14:21):

Do I need to import fail from somewhere?

Nathan Glenn (Jan 22 2023 at 14:23):

I get unknown identifier 'fail'

Mario Carneiro (Jan 22 2023 at 14:25):

tactic.fail maybe

Mario Carneiro (Jan 22 2023 at 14:26):

or open tactic beforehand

Nathan Glenn (Jan 22 2023 at 14:26):

tactic.fail works!

Nathan Glenn (Jan 22 2023 at 14:26):

guard uses failure, which doesn't take a message argument

Nathan Glenn (Jan 22 2023 at 14:27):

Okay so one last question

Nathan Glenn (Jan 22 2023 at 14:27):

If I want to re-use this logic

Nathan Glenn (Jan 22 2023 at 14:28):

can I put this in a function that takes sub (or other binary function) as an argument?

Nathan Glenn (Jan 22 2023 at 14:29):

I'm just unsure about #eval, because it's compile time... will I hit a wall somewhere because some things cannot be computed at compile-time, or is everything computable at compile-time?

Mario Carneiro (Jan 22 2023 at 14:31):

open tactic
meta def test_binop {α β γ} [has_to_format α] [has_to_format β] [has_to_format γ] [decidable_eq γ]
  (name : string) (f : α → β → γ) (test_data : list (α × β × γ)) : tactic unit :=
test_data.mmap' (λ ⟨a, b, c⟩,
  guard (f a b = c) <|> fail format!"{name} {a} {b} wasn't {c}, it was {f a b}")

#eval test_binop "sub" sub
  [(0, 0, 0), (0, 1, 0), (0, 7, 0), (1, 0, 1), (1, 1, 0), (3, 0, 3),
   (2, 7, 0), (3, 1, 2), (3, 3, 0), (3, 7, 0), (7, 2, 5)]

still untested

Mario Carneiro (Jan 22 2023 at 14:31):

Nathan Glenn said:

I'm just unsure about #eval, because it's compile time... will I hit a wall somewhere because some things cannot be computed at compile-time, or is everything computable at compile-time?

No, basically everything that lean can evaluate it can also run in the interpreter

Mario Carneiro (Jan 22 2023 at 14:33):

There are some things that cannot be evaluated, these are marked noncomputable, but you can't run them at run time either. (You can use them in rfl proofs however, although usually that won't work either, depending on how the function is defined.)

Mario Carneiro (Jan 22 2023 at 14:34):

stuff like "use the axiom of choice to pick an even number" is not computable

Mario Carneiro (Jan 22 2023 at 14:38):

Mario Carneiro said:

No, basically everything that lean can evaluate it can also run in the interpreter

This is something you will notice that is unusual about lean compared to other programming languages - almost everything is done at compile time, and in many cases compiling a lean file already extracts all the benefit we intend to get out of the file, for example if it contains a bunch of theorems then those theorems can't be run, their only purpose is to be compiled and typechecked in order to prove the theorem

Nathan Glenn (Jan 22 2023 at 14:40):

taking me a moment to parse this...

Nathan Glenn (Jan 22 2023 at 14:41):

I get invalid definition, it uses untrusted declaration 'has_to_format' for the code you pasted

Nathan Glenn (Jan 22 2023 at 14:45):

I also tried this less general version:

def test_binary (name : string) (f : ℕ → ℕ → ℕ) (data: list (ℕ × ℕ × ℕ)) : tactic unit :=
  data.mmap' (λ ⟨a, b, c⟩,
    guard (f a b = c) <|> tactic.fail format!"{name} {a} {b} wasn't {c}, it was {f a b}")

And it gives me invalid definition, it uses untrusted declaration 'tactic'

Mario Carneiro (Jan 22 2023 at 14:49):

use meta def instead of def

Nathan Glenn (Jan 22 2023 at 14:51):

:magic:

Nathan Glenn (Jan 22 2023 at 14:51):

Thanks :D

Nathan Glenn (Jan 22 2023 at 14:54):

I've found the programming_in_lean document, so I'll give that a read-through.

Nathan Glenn (Jan 22 2023 at 19:38):

Follow-up question regarding abstracting over attributes in Lean

Nathan Glenn (Jan 22 2023 at 19:38):

I'm on to the homework assignment, and this time I need to test that a reverse function works properly.

Nathan Glenn (Jan 22 2023 at 19:39):

Reverse is defined like so:

def reverse {α : Type} : list α → list α
| []        := []
| (x :: xs) := reverse xs ++ [x]

Nathan Glenn (Jan 22 2023 at 19:39):

And I've adapted your test_binop code for a unary function like so:

meta def test_unary_op {α β} [has_to_format α] [has_to_format β] [decidable_eq β]
  (name : string) (f : α → β) (test_data : list (α × β)) : tactic unit :=
test_data.mmap' (λ ⟨a, b⟩,
  guard (f a = b) <|> tactic.fail format!"{name} {a} wasn't {b}, it was {f a}")

Nathan Glenn (Jan 22 2023 at 19:40):

Now, I can test with a list containing one type, no problem:

#eval test_unary_op "reverse" reverse
  [(([] : list ℕ), []), ([1,2,3], [3,2,1])]

Nathan Glenn (Jan 22 2023 at 19:41):

But I would love to be able to try this with multiple list types at once, like this:

#eval test_unary_op "reverse" reverse
  [(([1,2,3], [3,2,1])), (["h","e","l","l","o"],["o", "l","l","e","h"])]

Nathan Glenn (Jan 22 2023 at 19:42):

This code gives me "failed to synthesize type class", because it can't tell what type of list I'm giving it (the tuples have multiple types). But for the purposes of testing, all that matters to me is that each tuple has a consistent type among its elements. For the entire list, all I care about is that all each of the tuples satisfies decidable_eq. Is it possible to model this?

Eric Wieser (Jan 22 2023 at 20:44):

You can use a list of sigma types

Eric Wieser (Jan 22 2023 at 20:52):

But that turns out to be really annoying to write, especially since you have decidability constraints to encode.

Eric Wieser (Jan 22 2023 at 20:52):

Why do you need this? Why not just write

#eval do
    test_unary_op "reverse" reverse (([1,2,3], [3,2,1])),
    test_unary_op "reverse" reverse (["h","e","l","l","o"],["o", "l","l","e","h"])

Nathan Glenn (Jan 22 2023 at 20:56):

"need" not so much, "want" is more like it. It's just the way I'm used to programming. The requirement that I can only test one type of list at a time is just awkward to me. This particular application happens to be really simple, but I'm not so sure I won't find a larger impediment to me later, so this seemed like a good time to ask how this is done. But I can wait until I get to sigma types in the tutorials.

Eric Wieser (Jan 22 2023 at 21:07):

I don't think sigma types are going to help you here

Eric Wieser (Jan 22 2023 at 21:09):

You're trying to generalize over a lot of things at once:

Elements of the list
Things which aren't lists at all
Algorithms for printing a family of types
Algorithms for deciding equality of a family of types

Eric Wieser (Jan 22 2023 at 21:13):

It can certainly be done:

-- Note: this only works if `f` places no requirements on the family of types
meta def test_unary_op  {α β : Type* → Type*} [Π i [has_to_format i], has_to_format (α i)] [Π i [has_to_format i], has_to_format (β i)] [Π i  [decidable_eq i], decidable_eq (β i)]
  (name : string) (f : Π i, α i → β i) (test_data : list (Σ (i : Type*) [decidable_eq i] [has_to_format i], α i × β i)) : tactic unit :=
test_data.mmap' (λ ⟨i, h1, h2, a, b⟩,
  (by haveI := h1; exact guard (f _ a = b)) <|> tactic.fail (by haveI := h2; exact format!"{name} {a} wasn't {b}, it was {f _ a}"))

#eval @test_unary_op list list
  -- typeclass inference can't handle this type of thing automatically
  (λ i h, by resetI; apply_instance)
  (λ i h, by resetI; apply_instance)
  (λ i h, by resetI; apply_instance)
  "reverse" (@list.reverse)
  [⟨_, infer_instance, infer_instance, ([1,2,3], [3,2,1])⟩,
   ⟨_, infer_instance, infer_instance, (["h","e","l","l","o"],["o", "l","l","e","h"])⟩]

Eric Wieser (Jan 22 2023 at 21:14):

But the amount of boilerplate you have to write to fight lean make it not worth it

Eric Wieser (Jan 22 2023 at 21:14):

And even the above doesn't work if you want to test reverse on types in different universes

Eric Wieser (Jan 22 2023 at 21:16):

Probably a more reasonable option is to write a metaprogram to generate the repeative code I linked above

Tyler Josephson ⚛️ (Jan 22 2023 at 21:23):

As an engineering professor who teaches test-driven design to my students learning Python, I thought I’d chime in.

In Python, you write tests alongside your code to make sure it does what you want, to facilitate continuous integration, etc. A good programmer writes good, informative tests, but you’re limited to things like “check the output of this function is correct on inputs 5, 10, and 15.” What makes Lean unique is you can write proofs alongside your code, to essentially say “prove the output of this function is correct for all natural numbers, while also restricting the function to only accept natural numbers as input.” Lean checks all proofs at compile time, and in mathlib. I believe this paradigm is stronger than the old test-driven design paradigm.

But, one gap in the proofs approach is, “even if I’ve defined an object and proved something about it, how can I be sure I e defined the right thing?” Our current answer to this is to use examples, see this blog post: https://leanprover-community.github.io/blog/posts/lte-examples/

I don’t know if anyone has consolidated examples in a structure before with the intent on checking them along the lines of test driven development.

Nathan Glenn (Jan 24 2023 at 00:10):

I'm still brand-new to this, so it's really helpful to get your perspectives here. The code above is quite beyond my comprehension right now :D The point about constructing proofs being a better option than extensive tests is a good one. Thanks for the help!

Last updated: May 02 2025 at 03:31 UTC