Zulip Chat Archive

Stream: Lean for teaching

Topic: Checking equality of types between different `Environment`s

Ricardo Prado Cunha (Aug 20 2024 at 13:57):

I am writing a Gradescope autograder for Lean (the existing one wasn't quite suitable for my needs), but I'm struggling to figure out the best way to check whether the type of the submission is definitionally the exact same as that of the problem.
Looking over the existing solutions I've found, robertylewis merely compares the ConstantInfo.type Expr, which I believe enables cheating if the submission changes the definition of anything used in the type. For example, theorem exercise1 : Foo := /- ... -/ would be said to have the same type under this check, regardless of whether Foo := False or Foo := True. I think this could be fixed by having definitions be placed in another file that is required to be imported by the submission, though this is not very ergonomic (I'd rather the relevant definitions be either built-in/from Mathlib, or in the same file).
The other solution I've found is the one by adamtopaz, where the type of the submission is ignored, and instead the value is re-evaluated in a clean Environment, whose type is then compared to the correct type. The issue I see with this one is that the submission would be unable to use anything not in the clean environment, including imports (though that could be fixed with just an import Mathlib in the clean environment) and custom auxiliary lemmas.
Does anyone have any other ideas on how to solve this? I think the ideal solution would be to check for equality recursively, by confirming whether the two types have the same Expr, then checking whether anything referenced therein are also defined the same, so on and so forth, though I'm not sure how to do this.

Matthew Ballard (Aug 20 2024 at 14:34):

Some thoughts:

I think when you get to the point of trying to control a provided environment, you have already lost. The work you will put in will exceed the value you extract. You have already entered a very long tail for those returns and a clever and determined enough adversary will always push you further.
There have been improvements with lake test in the interim. For many languages, the natural entry point for auto-grading is the native testing framework.

Matthew Ballard (Aug 20 2024 at 14:40):

For reference, I have a class of 26 that works in groups of size 5-6. I am using the Batteries library test driver and some guard_msgs with GitHub classroom. The class is an introduction to cryptography. I am more concerned about them being to implement things in Lean than prove correctness. So my tests are analogous to standard tests in other languages.

Matthew Ballard (Aug 20 2024 at 14:41):

Given the number of weekly submissions, I will be looking at the code of everyone and be able to notice changes to the test files.

Matthew Ballard (Aug 20 2024 at 14:44):

All that said: mathlib uses https://github.com/leanprover/lean4checker as part of its CI

Siddhartha Gadgil (Aug 21 2024 at 04:27):

There is a function isDefEq (at Meta level) that checks whether two types are definitionally equal. Could you simply use it?

Ricardo Prado Cunha (Aug 21 2024 at 15:08):

Siddhartha Gadgil said:

There is a function isDefEq (at Meta level) that checks whether two types are definitionally equal. Could you simply use it?

Do you mean this one https://leanprover-community.github.io/mathlib4_docs/Lean/Meta/Basic.html#Lean.Meta.isDefEq ? I don't think it supports checking for equality across different environments.

Ricardo Prado Cunha (Aug 21 2024 at 15:12):

Matthew Ballard said:

I think when you get to the point of trying to control a provided environment, you have already lost. The work you will put in will exceed the value you extract. You have already entered a very long tail for those returns and a clever and determined enough adversary will always push you further.

Yeah, that's fair. It's probably worth skimming through every submission, not just to check for cheating, but also to adapt teaching and provide feedback occasionally.

Matthew Ballard said:

All that said: mathlib uses https://github.com/leanprover/lean4checker as part of its CI

Isn't this just to re-evaluate definitions in the original environment, rather than compare types across environments? Or am I misunderstanding it?

Matthew Ballard (Aug 23 2024 at 12:42):

This is in the "sanitize an environment handed to me" vein. If you are happy with the sanitization checks, then checking across environments can be avoided.

Last updated: May 02 2025 at 03:31 UTC