Zulip Chat Archive
Stream: Lean for teaching
Topic: Automatic grading of lean4 codes
Asei Inoue (Aug 23 2023 at 14:31):
Given a Lean4 project, how can I verify with a command that a theorem defined there is proved correctly without using sorry
?
Asei Inoue (Aug 23 2023 at 14:34):
For example, if the code is placed on GitHub, how can we verify that the GitHub action correctly proves all the lemmas and theorems defined in lean files?
Pedro Sánchez Terraf (Aug 23 2023 at 14:48):
Asei Inoue said:
Given a Lean4 project, how can I verify with a command that a theorem defined there is proved correctly without using
sorry
?
For this you can write
#print axioms your_theorem
in VSCode, and check sorryAx
is not on the list.
Adam Topaz (Aug 23 2023 at 14:58):
Check out https://github.com/adamtopaz/lean_grader (still under development)
Anne Baanen (Aug 23 2023 at 15:58):
In addition to #print axioms
, I would recommend also checking that the theorem statement is equal to the intended theorem statement :)
Asei Inoue (Aug 24 2023 at 10:24):
Adam Topaz said:
Check out https://github.com/adamtopaz/lean_grader (still under development)
Thank you. How can I use this lean_grader?
Asei Inoue (Aug 24 2023 at 11:11):
I would like to perform the check as a CI instead of checking on the VSCode screen. I would like to make an error if sorry
is used in the proof, and not make an error if sorry
is not used and the proof is correct.
Adam Topaz (Aug 24 2023 at 14:28):
@Asei Inoue take a look at the following: https://github.com/adamtopaz/hw_template
This is a hw template using the repo I mentioned above. To use it: clone the hw_template
repo and run lake exe grade ***
where ***
is the (lean4) hash of Nat
as an Expr
. You can use #type_hash solution
to obtain this hash (in general, if solution : X
then #type_hash solution
gives the hash of X
as an Expr
). This hash should be changed for each assignment, to match the hash of the required type of solution
.
This checks whether a declaration named solution
in the file Solution.lean
exists, whether the type of solution
has a hash matching the one provided, and whether the definition of solution
uses any axioms (sorryAx
is the axiom used when you use sorry
, so this checks for uses of sorry
, among other axioms).
Adam Topaz (Aug 24 2023 at 14:31):
As I said, this is still under development. Right now it only checks a single declaration solution
in a file called Solution.lean
in the root of the homework project directory. But I hope the code is simple enough to understand that it can be modified to suite your needs.
Adam Topaz (Aug 24 2023 at 14:32):
lake exe grade ***
will return an exit code of 0
for success, and a code of 1
for failure, so it can be directly used in github's CI as part of a script, for example.
Adam Topaz (Aug 24 2023 at 14:33):
The homework template repo also has some stuff in the .github
folder which makes it usable with github classrooms.
Asei Inoue (Aug 25 2023 at 11:32):
Thank you very much! I have tried it, but it behaves strangely. Why does "Solution is valid" message appear?
Eric Wieser (Aug 25 2023 at 11:40):
What's the purpose of the hash here? (vs getting the true statement from a golden file)
Adam Topaz (Aug 25 2023 at 12:59):
That’s a bug that was fixed bu I guess I forgot to update the lake manifest in the homework template. Please run lake update
Asei Inoue (Aug 25 2023 at 13:01):
Adam Topaz said:
That’s a bug that was fixed bu I guess I forgot to update the lake manifest in the homework template. Please run
lake update
Thanks!
Adam Topaz (Aug 25 2023 at 13:19):
@Eric Wieser I don’t know what you mean by a “golden file”.
Eric Wieser (Aug 25 2023 at 13:20):
I mean that you have a file that you ask students not to edit (or provide with the grading setup) that contains def TheIntendedStatement : Prop := whatever
and check that example : TheIntendedStatement := theStudentAnswer
typechecks
Eric Wieser (Aug 25 2023 at 13:20):
(or an analogous metaprogram)
Eric Wieser (Aug 25 2023 at 13:23):
That's essentially how codewars graded submissions in Lean 3
Adam Topaz (Aug 25 2023 at 13:24):
Yeah that would work too, but the point is that students can edit everything in the git repo, at least with GitHub classrooms.
Adam Topaz (Aug 25 2023 at 13:27):
I was trying to come up with something that would be as hands-off as possible, so I didn’t really want to check whether students modified some “untouchable” file. But at the end of the day, the students can even edit the CI files themselves, so something has to be checked.
Matthew Ballard (Aug 25 2023 at 13:29):
Have CI file that decrypts an encrypted url string and curls the real CI file ;)
Adam Topaz (Aug 25 2023 at 13:29):
But they can just change that CI file!
Matthew Ballard (Aug 25 2023 at 13:30):
The real CI file should do something random and specific
Matthew Ballard (Aug 25 2023 at 13:30):
I am only half serious here. I think if students are skilled and motivated enough to do this then you can pass them just fine
Adam Topaz (Aug 25 2023 at 13:31):
So what I’m doing now is having the CI emit the sha hash of the valid CI files. So I only have to check that those hashes match
Adam Topaz (Aug 25 2023 at 13:35):
If you’re one of my students, please forget everything you read above ;)
Matthew Ballard (Aug 25 2023 at 13:43):
I vote for "Professor Topaz, would you like to play a game?" if students dig too far in :)
Eric Wieser (Aug 25 2023 at 13:54):
Adam Topaz said:
But they can just change that CI file!
You put the CI file and grading script in a separate repository
Eric Wieser (Aug 25 2023 at 13:55):
Like I did here for a non-lean course I ran
Adam Topaz (Aug 25 2023 at 13:55):
But that line can still be changed!
Notification Bot (Aug 25 2023 at 13:55):
30 messages were moved here from #new members > Automatic grading of lean4 codes by Eric Wieser.
Eric Wieser (Aug 25 2023 at 13:58):
Adam Topaz said:
But that line can still be changed!
Sure, but they could also replace the CI script with one that does nothing to produce a green check!
Adam Topaz (Aug 25 2023 at 14:03):
Yeah exactly. This is a pretty glaring hole in GitHub classrooms IMO
Eric Wieser (Aug 25 2023 at 14:07):
You can also override your "points" to something arbitrary; I actually do this to award students things like 176/10
(aka 17.6) which is their score on our optimization problem
Arthur Paulino (Aug 25 2023 at 14:11):
Maybe this is already accounted for, but it's also necessary to check that the list of axioms used only contains a set of allowed axioms
Adam Topaz (Aug 25 2023 at 14:15):
Right now it checks if the list of axioms is empty
Adam Topaz (Aug 25 2023 at 14:15):
It would be easy to modify the script to check for a specific collection of axioms (specified by name)
Asei Inoue (Aug 25 2023 at 16:02):
why not using pull_request_target trigger?
Asei Inoue (Aug 25 2023 at 16:12):
Adam Topaz said:
That’s a bug that was fixed bu I guess I forgot to update the lake manifest in the homework template. Please run
lake update
After running "lake update","lake exe cache get" failed. "Not found" error occurs.
Eric Wieser (Aug 25 2023 at 16:12):
Asei Inoue said:
why not using pull_request_target trigger?
Github classroom doesn't encourage the use of pull requests against an upstream repo
Eric Wieser (Aug 25 2023 at 16:12):
Instead it makes a duplicate of the course repository with no history and no in-git link back to the template
Adam Topaz (Sep 11 2023 at 19:36):
I'm not sure if anyone is seriously using https://github.com/adamtopaz/lean_grader and the template https://github.com/adamtopaz/hw_template but just in case, I should mention that I made some significant modifications to this.
Instead of just checking that the hash of some expression matches something passed in as an argument, I now check whether two expressions are defeq. Really what's happening is that the grade
command provided by https://github.com/adamtopaz/lean_grader has two subcommands: save
and compare
.
save
will save the expression associated to a termsolution
in the fileSolution.lean
to the filesystem (using Std's pickle).compare
loads such pickles into memory and checks if they are defeq.
These commands now all use the CLI package, so usage is described using the --help
flag.
Adam Topaz (Sep 11 2023 at 19:36):
Here is a small test of the CI action for github classroom: https://github.com/adamtopaz/hw_template_test
Last updated: Dec 20 2023 at 11:08 UTC