Zulip Chat Archive
Stream: Lean for teaching
Topic: Automatic grading of lean4 codes
Asei Inoue (Aug 23 2023 at 14:31):
Given a Lean4 project, how can I verify with a command that a theorem defined there is proved correctly without using sorry
?
Asei Inoue (Aug 23 2023 at 14:34):
For example, if the code is placed on GitHub, how can we verify that the GitHub action correctly proves all the lemmas and theorems defined in lean files?
Pedro Sánchez Terraf (Aug 23 2023 at 14:48):
Asei Inoue said:
Given a Lean4 project, how can I verify with a command that a theorem defined there is proved correctly without using
sorry
?
For this you can write
#print axioms your_theorem
in VSCode, and check sorryAx
is not on the list.
Adam Topaz (Aug 23 2023 at 14:58):
Check out https://github.com/adamtopaz/lean_grader (still under development)
Anne Baanen (Aug 23 2023 at 15:58):
In addition to #print axioms
, I would recommend also checking that the theorem statement is equal to the intended theorem statement :)
Asei Inoue (Aug 24 2023 at 10:24):
Adam Topaz said:
Check out https://github.com/adamtopaz/lean_grader (still under development)
Thank you. How can I use this lean_grader?
Asei Inoue (Aug 24 2023 at 11:11):
I would like to perform the check as a CI instead of checking on the VSCode screen. I would like to make an error if sorry
is used in the proof, and not make an error if sorry
is not used and the proof is correct.
Adam Topaz (Aug 24 2023 at 14:28):
@Asei Inoue take a look at the following: https://github.com/adamtopaz/hw_template
This is a hw template using the repo I mentioned above. To use it: clone the hw_template
repo and run lake exe grade ***
where ***
is the (lean4) hash of Nat
as an Expr
. You can use #type_hash solution
to obtain this hash (in general, if solution : X
then #type_hash solution
gives the hash of X
as an Expr
). This hash should be changed for each assignment, to match the hash of the required type of solution
.
This checks whether a declaration named solution
in the file Solution.lean
exists, whether the type of solution
has a hash matching the one provided, and whether the definition of solution
uses any axioms (sorryAx
is the axiom used when you use sorry
, so this checks for uses of sorry
, among other axioms).
Adam Topaz (Aug 24 2023 at 14:31):
As I said, this is still under development. Right now it only checks a single declaration solution
in a file called Solution.lean
in the root of the homework project directory. But I hope the code is simple enough to understand that it can be modified to suite your needs.
Adam Topaz (Aug 24 2023 at 14:32):
lake exe grade ***
will return an exit code of 0
for success, and a code of 1
for failure, so it can be directly used in github's CI as part of a script, for example.
Adam Topaz (Aug 24 2023 at 14:33):
The homework template repo also has some stuff in the .github
folder which makes it usable with github classrooms.
Asei Inoue (Aug 25 2023 at 11:32):
Thank you very much! I have tried it, but it behaves strangely. Why does "Solution is valid" message appear?
Eric Wieser (Aug 25 2023 at 11:40):
What's the purpose of the hash here? (vs getting the true statement from a golden file)
Adam Topaz (Aug 25 2023 at 12:59):
That’s a bug that was fixed bu I guess I forgot to update the lake manifest in the homework template. Please run lake update
Asei Inoue (Aug 25 2023 at 13:01):
Adam Topaz said:
That’s a bug that was fixed bu I guess I forgot to update the lake manifest in the homework template. Please run
lake update
Thanks!
Adam Topaz (Aug 25 2023 at 13:19):
@Eric Wieser I don’t know what you mean by a “golden file”.
Eric Wieser (Aug 25 2023 at 13:20):
I mean that you have a file that you ask students not to edit (or provide with the grading setup) that contains def TheIntendedStatement : Prop := whatever
and check that example : TheIntendedStatement := theStudentAnswer
typechecks
Eric Wieser (Aug 25 2023 at 13:20):
(or an analogous metaprogram)
Eric Wieser (Aug 25 2023 at 13:23):
That's essentially how codewars graded submissions in Lean 3
Adam Topaz (Aug 25 2023 at 13:24):
Yeah that would work too, but the point is that students can edit everything in the git repo, at least with GitHub classrooms.
Adam Topaz (Aug 25 2023 at 13:27):
I was trying to come up with something that would be as hands-off as possible, so I didn’t really want to check whether students modified some “untouchable” file. But at the end of the day, the students can even edit the CI files themselves, so something has to be checked.
Matthew Ballard (Aug 25 2023 at 13:29):
Have CI file that decrypts an encrypted url string and curls the real CI file ;)
Adam Topaz (Aug 25 2023 at 13:29):
But they can just change that CI file!
Matthew Ballard (Aug 25 2023 at 13:30):
The real CI file should do something random and specific
Matthew Ballard (Aug 25 2023 at 13:30):
I am only half serious here. I think if students are skilled and motivated enough to do this then you can pass them just fine
Adam Topaz (Aug 25 2023 at 13:31):
So what I’m doing now is having the CI emit the sha hash of the valid CI files. So I only have to check that those hashes match
Adam Topaz (Aug 25 2023 at 13:35):
If you’re one of my students, please forget everything you read above ;)
Matthew Ballard (Aug 25 2023 at 13:43):
I vote for "Professor Topaz, would you like to play a game?" if students dig too far in :)
Eric Wieser (Aug 25 2023 at 13:54):
Adam Topaz said:
But they can just change that CI file!
You put the CI file and grading script in a separate repository
Eric Wieser (Aug 25 2023 at 13:55):
Like I did here for a non-lean course I ran
Adam Topaz (Aug 25 2023 at 13:55):
But that line can still be changed!
Notification Bot (Aug 25 2023 at 13:55):
30 messages were moved here from #new members > Automatic grading of lean4 codes by Eric Wieser.
Eric Wieser (Aug 25 2023 at 13:58):
Adam Topaz said:
But that line can still be changed!
Sure, but they could also replace the CI script with one that does nothing to produce a green check!
Adam Topaz (Aug 25 2023 at 14:03):
Yeah exactly. This is a pretty glaring hole in GitHub classrooms IMO
Eric Wieser (Aug 25 2023 at 14:07):
You can also override your "points" to something arbitrary; I actually do this to award students things like 176/10
(aka 17.6) which is their score on our optimization problem
Arthur Paulino (Aug 25 2023 at 14:11):
Maybe this is already accounted for, but it's also necessary to check that the list of axioms used only contains a set of allowed axioms
Adam Topaz (Aug 25 2023 at 14:15):
Right now it checks if the list of axioms is empty
Adam Topaz (Aug 25 2023 at 14:15):
It would be easy to modify the script to check for a specific collection of axioms (specified by name)
Asei Inoue (Aug 25 2023 at 16:02):
why not using pull_request_target trigger?
Asei Inoue (Aug 25 2023 at 16:12):
Adam Topaz said:
That’s a bug that was fixed bu I guess I forgot to update the lake manifest in the homework template. Please run
lake update
After running "lake update","lake exe cache get" failed. "Not found" error occurs.
Eric Wieser (Aug 25 2023 at 16:12):
Asei Inoue said:
why not using pull_request_target trigger?
Github classroom doesn't encourage the use of pull requests against an upstream repo
Eric Wieser (Aug 25 2023 at 16:12):
Instead it makes a duplicate of the course repository with no history and no in-git link back to the template
Adam Topaz (Sep 11 2023 at 19:36):
I'm not sure if anyone is seriously using https://github.com/adamtopaz/lean_grader and the template https://github.com/adamtopaz/hw_template but just in case, I should mention that I made some significant modifications to this.
Instead of just checking that the hash of some expression matches something passed in as an argument, I now check whether two expressions are defeq. Really what's happening is that the grade
command provided by https://github.com/adamtopaz/lean_grader has two subcommands: save
and compare
.
save
will save the expression associated to a termsolution
in the fileSolution.lean
to the filesystem (using Std's pickle).compare
loads such pickles into memory and checks if they are defeq.
These commands now all use the CLI package, so usage is described using the --help
flag.
Adam Topaz (Sep 11 2023 at 19:36):
Here is a small test of the CI action for github classroom: https://github.com/adamtopaz/hw_template_test
Sergey Cherkis (Jan 07 2024 at 04:27):
New to GitHub Codespace and GitHub Classroom. How does one setup auto graded Lean 4 Mathlib homework assignments using these?
To clarify: Found https://leanprover-community.github.io/teaching/resources.html#lean-in-the-cloud-setups
and A GitHub Classrooms autograder for Lean 4. Would love to know how to use autogravder and lean_grader or any other alternative.
Adam Topaz (Jan 07 2024 at 16:41):
@Sergey Cherkis I should add a warning that my repo is not quite ready for general use. I did use it in my formalization course last term but there are a few unfortunate bugs that I haven't had a chance to fix yet (namely, if a student adds a variable
command somewhere in the file, the behavior of lean4 makes it so that the autograding breaks even if a valid solution is present).
Adam Topaz (Jan 07 2024 at 16:50):
Plus, I think I read somewhere that there are some upcoming changes to how github classrooms works, and I haven't looked into that whatsoever.
Heather Macbeth (Jan 07 2024 at 17:18):
I have been happy with Gradescope (and Rob's autograder for it), but it is for-pay. The other day I was looking around at what is available for free and found OK and CodePost. Has anyone tried either?
Sergey Cherkis (Jan 08 2024 at 01:26):
Heather Macbeth said:
I have been happy with Gradescope (and Rob's autograder for it), but it is for-pay. The other day I was looking around at what is available for free and found OK and CodePost. Has anyone tried either?
Would OK and CodePost be in some way better than github classrooms ?
With any of these, where does one begin to learn how to use them for teaching math with Lean?
You have this beautiful Math2001 setup. What are the relevant tools (in order) and how does one learn to use them?
(So far I managed to get Codespace working a MIL repo and got a Classroom account.)
Heather Macbeth (Jan 08 2024 at 02:12):
@Sergey Cherkis Gradescope is a bit easier for the students to use, in my opinion. Gradescope supports having the students upload just a single file containing the homework assignment, which will then be compiled in a standard environment (lakefile, etc.) provided to Gradescope by you, the instructor. This means the students can do the whole semester's work in one repository (one file per homework assignment), so once they have successfully installed/set up at the beginning of the semester, not much can go wrong.
With GitHub Classrooms, I believe the students need to download/set up a separate template project for every homework assignment. They also need to use git to record their submission as a series of commits on top of this template project.
The GitHub Classrooms model seemed to have more points of possible failure, especially for the kind of class I teach (first- and second-year students who don't necessarily have programming experience). This is why, if I move away from Gradescope, I would explore alternatives to GitHub Classrooms.
On the other hand, @Matthew Ballard has made the case (somewhere in an earlier thread) that these points of possible failure are actually learning opportunities, since understanding git and GitHub is itself a useful skill.
Sergey Cherkis (Jan 29 2024 at 17:24):
Thank you, @Heather Macbeth !
Last updated: May 02 2025 at 03:31 UTC