Zulip Chat Archive

Stream: Lean for teaching

Topic: autograding with GH classroom


Matthew Ballard (Jan 27 2022 at 17:42):

I put together a basic auto-grading setup for Lean homework assignments using GitHub classroom at https://github.com/c0ppelius/lean-autograding

Matthew Ballard (Jan 27 2022 at 17:46):

The actual testing can certainly be improved. The main value of the repo is to show how to integrate calls to Lean into the GH classroom setup.

Heather Macbeth (Jan 27 2022 at 18:03):

Darn, I hacked together something literally last night :)

Heather Macbeth (Jan 27 2022 at 18:04):

Anyway, thank you for this (I am sure) far more professional setup!

Matthew Ballard (Jan 27 2022 at 18:04):

Great minds :)

Matthew Ballard (Jan 27 2022 at 18:05):

Did you use GitHub Actions?

Heather Macbeth (Jan 27 2022 at 18:05):

Mine is for Gradescope, although it's probably not so different.

Matthew Ballard (Jan 27 2022 at 18:06):

That seems like it might have better LMS support

Matthew Ballard (Jan 27 2022 at 18:07):

It also has the advantage of not using git

Heather Macbeth (Jan 27 2022 at 18:12):

We'll see how my students find it, but the interface seems pretty easy, they just upload a single Lean file with a specified name.

Heather Macbeth (Jan 27 2022 at 18:15):

And I see we came up with the same mechanism for scoring: first fail globally if the file contains errors or warnings, then apply a series of tests to verify whether each exercise has the type it's supposed to.

Matthew Ballard (Jan 27 2022 at 18:21):

Ideally one would parse the document and test whether there is a proof of the correct type somewhere in there. If you wanted to give more open ended assignments.

Matthew Ballard (Jan 27 2022 at 18:23):

Things like axiom's and constant's could be filtered depending on if they came from source file, eg you have to prove X but can assume and use Y,Z and not worry about their proofs

Matthew Ballard (Jan 27 2022 at 18:23):

Did you include hints or anything in your message to students?

Heather Macbeth (Jan 27 2022 at 21:04):

@Matthew Ballard I didn't include any check to filter out axioms. My class is small enough that I plan to glance over everyone's code anyway, so I would catch that sort of thing visually. It would be very nice to have this check, though (you do this?).

And I also don't include any hints, except that in the "fails globally" case here:
Heather Macbeth said:

And I see we came up with the same mechanism for scoring: first fail globally if the file contains errors or warnings, then apply a series of tests to verify whether each exercise has the type it's supposed to.

it posts a message saying that it has failed globally due to an error/warning in the file, so that a student with a submission containing 4 correct problems doesn't get confused about why her score is 0.

Heather Macbeth (Jan 27 2022 at 21:05):

However, the proof is in the pudding! I am curious to see how it works when tried on actual students :)

Johan Commelin (Jan 28 2022 at 03:32):

If the proof is in the pudding, where is all that pudding in mathlib?? :yum:

Matthew Ballard (Feb 07 2022 at 21:36):

Continuing with this thread, I have some meta code that checks for a term of a given type anywhere in assignment.lean which removes the requirement that students name their solutions strictly.

Matthew Ballard (Feb 07 2022 at 21:37):

I still really wish I could pull the axioms without passing through stdout.

Matthew Ballard (Feb 07 2022 at 21:38):

import tactic
import assignment

variable T : Type

open tactic

meta def in_import (env : environment) (n : name) (path : string) : bool :=
(env.decl_olean n = path) && env.contains n && (n  [``quot, ``quot.mk, ``quot.lift, ``quot.ind])

meta def exact_list : list name  tactic unit
| [] := failed
| (H :: Hs) := do
                  e  mk_const H,
                  exact e <|> exact_list Hs

meta def check_solutions : tactic unit :=
do env <- get_env,
   let decls := env.fold [] list.cons,
   cwd  unsafe_run_io io.env.get_cwd,
   let names := decls.map declaration.to_name,
   let assignment_names := names.filter
     (λ x, in_import env x (cwd ++ "/src/assignment.lean") && not x.is_internal),
   exact_list assignment_names

theorem check : T  :=
begin
  check_solutions,
end

Matthew Ballard (Feb 08 2022 at 22:00):

Update: lean X.lean dumps everything to stdout. stderr is empty. So the GH autograding action only cares about the exit code of the process.

Gihan Marasingha (Aug 23 2022 at 12:35):

Great stuff! I've been away from Zulip for a few months and it's amazing to see what's happening. I'm actually planning to use Git Hub Classrooms with Lean for the new academic year.

Eric Wieser (Aug 24 2022 at 09:06):

I recently used github classrooms for a non-Lean course; one tip I'd give would be to put a very minimal action in the student repo that references the real action, so that you can fix bugs in it without having to tell students how to update their git repository to get the new grader. That looks something like:

jobs:
  build:
    # this name must be exactly this for the points to be reported to github classroom
    name: Autograding
    runs-on: ubuntu-latest

    steps:
    - uses: actions/checkout@v3
    - name: Run grading
      uses: my-classrooms/my_class_action@2022  # version by cohort so as not to break submissions next year

where my_class_action is a github "composite action"

Matthew Ballard (Aug 24 2022 at 15:30):

Gihan Marasingha said:

Great stuff! I've been away from Zulip for a few months and it's amazing to see what's happening. I'm actually planning to use Git Hub Classrooms with Lean for the new academic year.

Thanks. I am not sure if this is in a completely working state currently. I am trying out Lean 4 for my class so will have to rewrite it that in a week or so.

Matthew Ballard (Aug 24 2022 at 15:35):

This semester is my first honest rodeo with GH classrooms so I am noticing some sharp edges. For example, it appears that instructors cannot set student groups for assignments. In fact, an instructor cannot create the (empty) named groups themselves. All group creation and membership is on the student end. There is a time-intensive workaround to create the empty groups but it still relies on students correctly joining their assigned group. You cannot fix a mistake without dumping the whole assignment and starting over.

Matthew Ballard (Aug 24 2022 at 15:36):

Perhaps people here are better versed, but is this really the state of things for student group creation/curation?

Mike Shulman (Aug 29 2022 at 18:14):

I'm trying to use Lean 3 with gradescope, but I'm confused about how to deal with sorry. It seems that definitions using sorry produce only a warning, not an error, and in particular don't set a positive exit-code, so I'm not sure how to test for them programmatically. Is there a flag I can give to Lean to tell it to treat sorry as an error?

Mike Shulman (Aug 29 2022 at 18:15):

I tried to figure out from the posted GH code how you're dealing with this, but I couldn't even figure out how lean is getting called or its return values parsed.

Mike Shulman (Aug 29 2022 at 18:15):

Would anyone who's used gradescope be willing to share their autograder setup?

Alex J. Best (Aug 29 2022 at 18:25):

I think you can use trust level 0 -t0 flag to lean to get this behaviour, not sure what other effects this will have though

Matthew Ballard (Aug 29 2022 at 18:25):

Hi @Mike Shulman long time no see. Here is the relevant function. The classroom json file just calls lean .test/test.lean and expects nothing in response in stdout.

Mike Shulman (Aug 29 2022 at 18:28):

@Alex J. Best Thanks, that seems to work! It wasn't at all clear to me from the lean --help output that "do not trust any macro" included "do not allow sorry", but maybe sorry is a macro or something?

Mike Shulman (Aug 29 2022 at 18:30):

@Matthew Ballard Thanks. You appear to be doing some fancy coding in Lean that I don't understand. What does your script do if the student hasn't solved that exercise?

Alex J. Best (Aug 29 2022 at 18:31):

Yes this is not obvious from the help menu, but we've had this discussion on here somewhere before I think, and also wondered a lot about how to really check that people used no extra axioms for codeforces-like online challenges (its a long story and very hard to do right it turns out)

Matthew Ballard (Aug 29 2022 at 18:32):

(Again from memory but) I tested this and it should fail because it yields either an error because there is no statement with the appropriate name or the proof is not complete and it is sorry'd by default

Matthew Ballard (Aug 29 2022 at 18:33):

At the end of the rabbit hole was "you need to inspect the oleans"

Matthew Ballard (Aug 29 2022 at 18:35):

Anything beyond that can be gamed but if you just want something plausible, this was enough for me. (Currently I have go test file in the main directory for my cryto class to give you an idea of my level of concern ...)

Mike Shulman (Aug 29 2022 at 18:36):

Ok, thanks! I think I'm just going to test each exercise with a separate testing file run under -t0, so that I can test the exit codes rather than have to parse the textual output.

Matthew Ballard (Aug 29 2022 at 18:37):

Mine was mainly an exercise for myself in meta-programming :)

Mike Shulman (Aug 29 2022 at 18:37):

At least, until someone writes a gradescope-utils package in Lean that will automatically produce JSON... (-:O

Matthew Ballard (Aug 29 2022 at 18:38):

I am going to take another run in Lean 4 later this week and will report back. (Though not in gradescope)

Matthew Ballard (Aug 29 2022 at 18:58):

I now remember the concern that led me here: I didn’t want students to declare and use convenient axioms.

Mike Shulman (Aug 29 2022 at 20:06):

I tried that, but it seemed that student-declared axioms are caught by the computability checker: anything depending on an axiom has to be flagged noncomputable, and my test definitions are not so flagged, hence raise an error. But I don't know much about how Lean works; is there an easy way that students could defeat that?

Matthew Ballard (Aug 29 2022 at 20:12):

Label them noncomputable?

Mike Shulman (Aug 29 2022 at 20:52):

But an error will still be raised if my functions in the testing file aren't labeled noncomputable, and the students can't change those.

Matthew Ballard (Aug 29 2022 at 21:02):

This is good enough for Codewars so is probably good enough for coursework.

Matthew Ballard (Aug 29 2022 at 21:06):

If you are going to look at the submissions yourself, then the cost to benefit ratio tips over after just a simple sorry check.

Mario Carneiro (Aug 30 2022 at 00:03):

noncomputable is not a good defense against axiom use, because you are allowed to use axioms that prove propositions in computable functions. So a student could just have axiom my_sorry : false and they could even use it in a function implementation without it getting marked noncomputable.

Mario Carneiro (Aug 30 2022 at 00:12):

axiom my_sorry : false
def my_computable_def : nat  nat := my_sorry.elim

Siddhartha Gadgil (Aug 30 2022 at 11:07):

May be a silly question, but what if a student uses sorryAx manually instead of using the sorry macro?

Matthew Ballard (Aug 30 2022 at 11:31):

Codewars and test.leanboth implement the two step check:

  • have Lean check the proof
  • inspect the axioms to make sure nothing out of bounds is used

Codewars parses the stdout/stderr using some JavaScript. The test file in the repo does axiom inspection in Lean itself.

Matthew Ballard (Aug 30 2022 at 11:37):

General consensus from before was that this could be defeated but then a student probably would have learned enough in the process.

Matthew Ballard (Aug 30 2022 at 11:50):

Both @Rob Lewis and @Heather Macbeth have experience with gradescope in particular. They might know more about talking to it

Patrick Massot (Aug 30 2022 at 11:53):

All this becomes a real issue only if someone smart figures out a way to cheat and then tell everybody about it.

Matthew Ballard (Aug 30 2022 at 11:56):

We did make this stream public :smile:

Matthew Ballard (Aug 30 2022 at 12:01):

But I would think CS students are fairly familiar with looking for the test file, seeing the test cases, and then fitting a function to those.

Matthew Ballard (Aug 30 2022 at 12:03):

If you are looking at the submission, then this fails quickly though.

Mike Shulman (Aug 31 2022 at 16:08):

So you're saying there's no way to extract the information from lean about what axioms are used other than something like #print axioms and then parsing its textual output?

Alex J. Best (Aug 31 2022 at 16:12):

You can do it with lean meta-programming, e.g. as seen in https://github.com/leanprover-community/mathlib/issues/10954

Matthew Ballard (Aug 31 2022 at 16:40):

test.lean is a specific lean 3 internal implementation of the meta-programming approach Hmm I thought I did a meta-programming one but it appears it took the easy way out :)

Matthew Ballard (Aug 31 2022 at 16:50):

Ah right @Mario Carneiro gave that implementation (also seen here https://leanprover.zulipchat.com/#narrow/stream/113489-new-members/topic/intuitionistic.20logic/near/224372208) and I put adding it to the repository on my to-do list (and never took it off)

Matthew Ballard (Aug 31 2022 at 17:05):

Also interacting with GitHub Classroom (and Gradescope too?) occurs via stdout/stderr anyway: give some json specifying which command to run and what to compare the output to.

Mike Shulman (Aug 31 2022 at 17:09):

Gradescope just runs a script that you specify, and expects the results to be placed in a specified JSON file. It doesn't do any output comparison, it expects a particular JSON format that specifies the score.

Matthew Ballard (Aug 31 2022 at 17:14):

Oh, it appears they have libraries in common languages to help with the file creation. Right? Or is it just Python?

Matthew Ballard (Aug 31 2022 at 17:17):

Hmm, even the Java one dumps to stdout and saves the result in a json file.

Mike Shulman (Aug 31 2022 at 17:39):

Yes, they have libraries that integrate with testing frameworks in Python and maybe a few other languages, but the end result is always a json file of the same format.

Mike Shulman (Sep 08 2022 at 21:36):

After actually trying this, I realized that my original question about sorry wasn't even what I really wanted anyway. Using -t0 answers that question, but raises an error if any function in the file uses sorry, when actually I would like them to be able to leave the sorry there for some questions and still have the others graded. So I guess I need to use print axioms even to detect sorry.

Gihan Marasingha (Sep 25 2022 at 16:26):

I'm using a Windows machine at the moment and @Matthew Ballard's test.lean file contains the string/src/assignment.lean, which Windows doesn't like!

Is there a way for Lean to use slashes appropriate to the OS or do something else to get the correct path?

Matthew Ballard (Sep 26 2022 at 16:01):

Does switching to \src\assignment.lean fix the issue?

Gihan Marasingha (Sep 26 2022 at 17:00):

Matthew Ballard said:

Does switching to \src\assignment.lean fix the issue?

Yes it does. I was wondering if there was an OS-agnostic solution. It's not a big issue if there isn't one!

Matthew Ballard (Sep 26 2022 at 17:57):

You should be able to match off the head of the result of io.get_cwd to handle different cases. Whether there is a builtin function for constructing a system path from a list, I am not sure.

Sarah Smith (Oct 03 2022 at 20:12):

Hi @Matthew Ballard , did you end up re-writing your autograder for Lean 4? Do you still plan to? https://github.com/mattrobball/lean-autograding

Matthew Ballard (Oct 03 2022 at 20:33):

I have something that I am currently using but the functionality is far from what I would like given the capabilities of Lean 4. The main limitation is time.

You can see it at work here.

Adam Topaz (Oct 07 2022 at 22:01):

https://twitter.com/derKha/status/1578493681304494080?s=20&t=xJ4T-mPBhLLOFn4RwICxAQ

Adam Topaz (Oct 07 2022 at 22:02):

Just pointing this out because the questions seem very similar to https://github.com/UofSC-Fall-2022-Math-300-H01/homework5/blob/main/Hw5.lean

Sebastian Ullrich (Oct 08 2022 at 08:19):

See also https://leanprover.zulipchat.com/#narrow/stream/116290-rss/topic/Newest.20questions.20tagged.20lean.20-.20Stack.20Overflow/near/302931112 where I took it from :)

Matthew Ballard (Oct 10 2022 at 17:34):

We talked about this in class today. Sinners-in the-hands-of-an-angry-god-style. :wink:

Matthew Ballard (Oct 10 2022 at 17:36):

But seriously, the most upsetting issue was the person on twitter that claimed this was from a freshmen CS course on discrete math. Really reminds me not to believe anything I read on the internet.

I should have looked more closely. These were familiar not because they were all in the assignment but because they came from Chapter 4 exercises in Logic and Proof!


Last updated: Dec 20 2023 at 11:08 UTC