Zulip Chat Archive

Stream: Lean for teaching

Topic: Automatic grading of lean4 codes


Asei Inoue (Aug 23 2023 at 14:31):

Given a Lean4 project, how can I verify with a command that a theorem defined there is proved correctly without using sorry?

Asei Inoue (Aug 23 2023 at 14:34):

For example, if the code is placed on GitHub, how can we verify that the GitHub action correctly proves all the lemmas and theorems defined in lean files?

Pedro Sánchez Terraf (Aug 23 2023 at 14:48):

Asei Inoue said:

Given a Lean4 project, how can I verify with a command that a theorem defined there is proved correctly without using sorry?

For this you can write

#print axioms your_theorem

in VSCode, and check sorryAxis not on the list.

Adam Topaz (Aug 23 2023 at 14:58):

Check out https://github.com/adamtopaz/lean_grader (still under development)

Anne Baanen (Aug 23 2023 at 15:58):

In addition to #print axioms, I would recommend also checking that the theorem statement is equal to the intended theorem statement :)

Asei Inoue (Aug 24 2023 at 10:24):

Adam Topaz said:

Check out https://github.com/adamtopaz/lean_grader (still under development)

Thank you. How can I use this lean_grader?

Asei Inoue (Aug 24 2023 at 11:11):

I would like to perform the check as a CI instead of checking on the VSCode screen. I would like to make an error if sorry is used in the proof, and not make an error if sorry is not used and the proof is correct.

Adam Topaz (Aug 24 2023 at 14:28):

@Asei Inoue take a look at the following: https://github.com/adamtopaz/hw_template

This is a hw template using the repo I mentioned above. To use it: clone the hw_template repo and run lake exe grade *** where *** is the (lean4) hash of Nat as an Expr. You can use #type_hash solution to obtain this hash (in general, if solution : X then #type_hash solution gives the hash of X as an Expr). This hash should be changed for each assignment, to match the hash of the required type of solution.

This checks whether a declaration named solution in the file Solution.lean exists, whether the type of solution has a hash matching the one provided, and whether the definition of solution uses any axioms (sorryAx is the axiom used when you use sorry, so this checks for uses of sorry, among other axioms).

Adam Topaz (Aug 24 2023 at 14:31):

As I said, this is still under development. Right now it only checks a single declaration solution in a file called Solution.lean in the root of the homework project directory. But I hope the code is simple enough to understand that it can be modified to suite your needs.

Adam Topaz (Aug 24 2023 at 14:32):

lake exe grade *** will return an exit code of 0 for success, and a code of 1 for failure, so it can be directly used in github's CI as part of a script, for example.

Adam Topaz (Aug 24 2023 at 14:33):

The homework template repo also has some stuff in the .github folder which makes it usable with github classrooms.

Asei Inoue (Aug 25 2023 at 11:32):

Thank you very much! I have tried it, but it behaves strangely. Why does "Solution is valid" message appear?

screenshot_lean_grader.png

Eric Wieser (Aug 25 2023 at 11:40):

What's the purpose of the hash here? (vs getting the true statement from a golden file)

Adam Topaz (Aug 25 2023 at 12:59):

That’s a bug that was fixed bu I guess I forgot to update the lake manifest in the homework template. Please run lake update

Asei Inoue (Aug 25 2023 at 13:01):

Adam Topaz said:

That’s a bug that was fixed bu I guess I forgot to update the lake manifest in the homework template. Please run lake update

Thanks!

Adam Topaz (Aug 25 2023 at 13:19):

@Eric Wieser I don’t know what you mean by a “golden file”.

Eric Wieser (Aug 25 2023 at 13:20):

I mean that you have a file that you ask students not to edit (or provide with the grading setup) that contains def TheIntendedStatement : Prop := whatever and check that example : TheIntendedStatement := theStudentAnswer typechecks

Eric Wieser (Aug 25 2023 at 13:20):

(or an analogous metaprogram)

Eric Wieser (Aug 25 2023 at 13:23):

That's essentially how codewars graded submissions in Lean 3

Adam Topaz (Aug 25 2023 at 13:24):

Yeah that would work too, but the point is that students can edit everything in the git repo, at least with GitHub classrooms.

Adam Topaz (Aug 25 2023 at 13:27):

I was trying to come up with something that would be as hands-off as possible, so I didn’t really want to check whether students modified some “untouchable” file. But at the end of the day, the students can even edit the CI files themselves, so something has to be checked.

Matthew Ballard (Aug 25 2023 at 13:29):

Have CI file that decrypts an encrypted url string and curls the real CI file ;)

Adam Topaz (Aug 25 2023 at 13:29):

But they can just change that CI file!

Matthew Ballard (Aug 25 2023 at 13:30):

The real CI file should do something random and specific

Matthew Ballard (Aug 25 2023 at 13:30):

I am only half serious here. I think if students are skilled and motivated enough to do this then you can pass them just fine

Adam Topaz (Aug 25 2023 at 13:31):

So what I’m doing now is having the CI emit the sha hash of the valid CI files. So I only have to check that those hashes match

Adam Topaz (Aug 25 2023 at 13:35):

If you’re one of my students, please forget everything you read above ;)

Matthew Ballard (Aug 25 2023 at 13:43):

I vote for "Professor Topaz, would you like to play a game?" if students dig too far in :)

Eric Wieser (Aug 25 2023 at 13:54):

Adam Topaz said:

But they can just change that CI file!

You put the CI file and grading script in a separate repository

Eric Wieser (Aug 25 2023 at 13:55):

Like I did here for a non-lean course I ran

Adam Topaz (Aug 25 2023 at 13:55):

But that line can still be changed!

Notification Bot (Aug 25 2023 at 13:55):

30 messages were moved here from #new members > Automatic grading of lean4 codes by Eric Wieser.

Eric Wieser (Aug 25 2023 at 13:58):

Adam Topaz said:

But that line can still be changed!

Sure, but they could also replace the CI script with one that does nothing to produce a green check!

Adam Topaz (Aug 25 2023 at 14:03):

Yeah exactly. This is a pretty glaring hole in GitHub classrooms IMO

Eric Wieser (Aug 25 2023 at 14:07):

You can also override your "points" to something arbitrary; I actually do this to award students things like 176/10 (aka 17.6) which is their score on our optimization problem

Arthur Paulino (Aug 25 2023 at 14:11):

Maybe this is already accounted for, but it's also necessary to check that the list of axioms used only contains a set of allowed axioms

Adam Topaz (Aug 25 2023 at 14:15):

Right now it checks if the list of axioms is empty

Adam Topaz (Aug 25 2023 at 14:15):

It would be easy to modify the script to check for a specific collection of axioms (specified by name)

Asei Inoue (Aug 25 2023 at 16:02):

why not using pull_request_target trigger?

Asei Inoue (Aug 25 2023 at 16:12):

Adam Topaz said:

That’s a bug that was fixed bu I guess I forgot to update the lake manifest in the homework template. Please run lake update

After running "lake update","lake exe cache get" failed. "Not found" error occurs.

Eric Wieser (Aug 25 2023 at 16:12):

Asei Inoue said:

why not using pull_request_target trigger?

Github classroom doesn't encourage the use of pull requests against an upstream repo

Eric Wieser (Aug 25 2023 at 16:12):

Instead it makes a duplicate of the course repository with no history and no in-git link back to the template

Adam Topaz (Sep 11 2023 at 19:36):

I'm not sure if anyone is seriously using https://github.com/adamtopaz/lean_grader and the template https://github.com/adamtopaz/hw_template but just in case, I should mention that I made some significant modifications to this.

Instead of just checking that the hash of some expression matches something passed in as an argument, I now check whether two expressions are defeq. Really what's happening is that the grade command provided by https://github.com/adamtopaz/lean_grader has two subcommands: save and compare.

  1. save will save the expression associated to a term solution in the file Solution.lean to the filesystem (using Std's pickle).
  2. compare loads such pickles into memory and checks if they are defeq.

These commands now all use the CLI package, so usage is described using the --help flag.

Adam Topaz (Sep 11 2023 at 19:36):

Here is a small test of the CI action for github classroom: https://github.com/adamtopaz/hw_template_test


Last updated: Dec 20 2023 at 11:08 UTC