Zulip Chat Archive

Stream: general

Topic: Proving for Fun


maximilian p.l. haslbeck (Jul 30 2019 at 07:05):

Dear Lean enthusiasts,

I'd like to advertise to you "Proving for Fun" -- an online platform where you can tackle proving challenges in your favorite proof assistant -- and ask for your help adding Lean as a supported language.

https://competition.isabelle.systems/competitions/contest/11/

This month's contest features freshly implemented support for Coq. This means you can now submit your solutions in both Isabelle and Coq. We want to support more ITPs and ACL2 is about to join. The contest is open until 9 August 2019.

Thanks to @Johannes Hölzl and @Kevin Kappelmann, we now also have a judging backend for Lean. We have translations for three of the tasks of this month's contest and they are up for your solutions. Please feel free to give feedback on the system and help translating the rest of the tasks; in the latter case please contact me here at zulip or via email (haslbema@in.tum.de).

This month's contest also serves as a warmup for the "Proof Ground" workshop, which
will take place in September alongside the ITP conference in Portland. The goal of the workshop is to bring together researchers from the ITP
community, to discuss and compete in a "proving contest", and it will use the Proving for Fun platform. If you happen to have ideas for
interesting tasks, please consider submitting them following our "Call for Problems".

We believe our problems should be a stimulating fun summer activity! We encourage you to participate and give us feedback on the prototype system.

Happy proving!

Kevin Kappelmann (Jul 30 2019 at 08:08):

Caveat: There is no support for mathlib (yet). Feel free to contribute here if you think that Lean without mathlib is the same as prohibiting the boxer the use of his fists.

Alex J. Best (Jul 30 2019 at 19:34):

Very cool, this is such a great idea! I think the lean system needs some extra checks adding though :rolling_on_the_floor_laughing: Screen-Shot-2019-07-30-at-15.31.26.png

Kevin Kappelmann (Jul 30 2019 at 19:42):

Oh oh, I assume that's a bug in my code and you did not actually break Lean :laughing: can you send me the code that made it pass so I can check what's wrong?

Alex J. Best (Jul 30 2019 at 19:48):

You are right I didn't break lean fortunately, its a bit cheeky:

notation `false` := true

theorem soundness_bug : false := trivial

Kevin Kappelmann (Jul 30 2019 at 19:53):

hahaha, damn! that's indeed very cheeky :big_smile:

maximilian p.l. haslbeck (Jul 31 2019 at 08:27):

hehe, yeah the task is called "Break the System" :), first time we confronted our colleagues with the Isabelle version of that task they came up with lots of ways to trick the system. I already expected that to happen for the first version of the lean judge. For Isabelle we have a blacklist of keywords that may not occur in the submission theory.

maximilian p.l. haslbeck (Jul 31 2019 at 08:27):

what keywords other than "notation" should be blacklisted for lean?

Johan Commelin (Jul 31 2019 at 08:29):

I paraphrase: "Lean without notation is the same as prohibiting the boxer the use of his fists."

Chris Hughes (Jul 31 2019 at 08:29):

I'm guessing the intended solution to the first problem wasn't 0, 500, 500

maximilian p.l. haslbeck (Jul 31 2019 at 08:33):

I'm guessing the intended solution to the first problem wasn't 0, 500, 500

ah, right, I just updated the Definition of pythagorean_triple, thanks

Chris Hughes (Jul 31 2019 at 08:38):

Now I have to prove 375^2 + 200^2 = 425^2 without mathlib. Looks like @Alex J. Best used the same trick as me, we both just moved down the leaderboard.

Johan Commelin (Jul 31 2019 at 08:40):

This is called "Proving for Fun" :interrobang:

Johan Commelin (Jul 31 2019 at 08:40):

¬mathlib → ¬calculator

maximilian p.l. haslbeck (Jul 31 2019 at 08:41):

so, we should add mathlib I guess? which version of it? is there an obvious choice?

Alex J. Best (Jul 31 2019 at 08:41):

Looks like Alex J. Best used the same trick as me, we both just moved down the leaderboard.

Unfortunately I wasn't that sneaky, but my proof was dependent on the original definition so it still broke :persevere:

Johan Commelin (Jul 31 2019 at 08:46):

Can't you use CI to always give users the latest and greatest?

Johan Commelin (Jul 31 2019 at 08:47):

Otherwise I would pick whatever the lean-3.4.2 branch points at right now.

Johan Commelin (Jul 31 2019 at 08:47):

I guess for most of these problems the basics of mathlib will suffice.

Rob Lewis (Jul 31 2019 at 08:49):

I think it would be very confusing for users if the mathlib version kept changing. Probably better to pick one and stick with it (for a while at least).

Johan Commelin (Jul 31 2019 at 08:52):

The next version of mathlib will ship a fun tactic that solves these exercises for you (-;

maximilian p.l. haslbeck (Jul 31 2019 at 16:15):

¬mathlib → ¬calculator

thanks to @Kevin Kappelmann mathlib (version 2019-07-31) is now available for the system, Lean version is: 3.4.2

maximilian p.l. haslbeck (Jul 31 2019 at 16:18):

if anybody is interested in translating the tasks "XOR" and "A funky grammar", please contact me! :) we can provide sample solutions in Isabelle and Coq!

Kevin Kappelmann (Jul 31 2019 at 16:18):

XOR and funky grammar.

Kevin Buzzard (Jul 31 2019 at 16:19):

They are surprisingly similar ;-)

Kevin Buzzard (Jul 31 2019 at 16:20):

Why can't we just do some collaborative effort here?

Kevin Kappelmann (Jul 31 2019 at 16:21):

They are surprisingly similar ;-)

Fixed :')

maximilian p.l. haslbeck (Aug 01 2019 at 08:40):

Why can't we just do some collaborative effort here?

I like the idea, but I can not see how to manage this. I'm reluctant to publish solutions before the contest ends. so I'd only like to disclose the solutions to someone who wants to do the translation.

Kevin Buzzard (Aug 01 2019 at 08:41):

Oh gosh I'm sorry, I hadn't internalised that this was still an open competition!

maximilian p.l. haslbeck (Aug 12 2019 at 13:31):

The "July 2019" contest at Proving for Fun is over now.
Congratulations to the winners and thanks to all the participants.
We hope you had some fun solving this month's problems and proving them formally.

We have set up a github repository with a summary of the contest, sample solutions and discussions about the problems. Feel free to participate in commenting any task in its issue and publishing your solution with a pull-request.

We specially want to thank the people who helped making the contest possible by creating the tasks and translating them to their proof assistant: Armaël Guéneau, Sebastiaan Joosten, Kevin Kappelmann, Simon Wimmer.

You still can submit your solutions for the tasks of the "July 2019" contest, they are now integrated into the "All Time" contest.

Tomorrow, Aug 13 at noon (MESZ) we start the "August 2019" contest which will be open until Sep 12 midnight. It also serves as a preparation for the contest happening at the Proof Ground workshop Sep 13 alongside the ITP conference in Portland.

Kevin Buzzard (Aug 12 2019 at 13:38):

The best Lean user were alexejbest shortly followed by chrishughes24 with 3 solved tasks: both solving two tasks with Lean and tricking our system.

Ha! Never trust a Lean user ;-)

Kevin Buzzard (Aug 12 2019 at 13:49):

(pointing out typo which is now fixed)

maximilian p.l. haslbeck (Aug 12 2019 at 13:50):

yeah, Aug 13 is correct ^^. thanks for pointing that out.

Simon Hudon (Aug 12 2019 at 14:45):

Is it still possible to use Lean to solve the problems?

Kevin Kappelmann (Aug 12 2019 at 14:47):

Yep, all problems will also be available in Lean :)

Kevin Buzzard (Aug 12 2019 at 15:04):

Many thanks Kevin! I see you were also involved with translating the previous problems into Lean

maximilian p.l. haslbeck (Aug 13 2019 at 10:04):

August 2019 Contest is open for submissions now. Have fun :) !

Kevin Kappelmann (Aug 13 2019 at 10:51):

...small Lean hiccup at the necklace problem. the check file needs to be corrected (submissions will automatically be re-graded for you) - sorry!

Kevin Kappelmann (Aug 13 2019 at 11:42):

...small Lean hiccup at the necklace problem. the check file needs to be corrected (submissions will automatically be re-graded for you) - sorry!

Now fixed - congrats @Chris Hughes for solving the first problem :)

Chris Hughes (Aug 13 2019 at 12:04):

Both statements for the second problem in Lean are false.

Kevin Buzzard (Aug 13 2019 at 12:12):

Maybe August's contest is supposed to be more challenging.

Kevin Kappelmann (Aug 13 2019 at 12:13):

Both statements for the second problem in Lean are false.

That's not great... :sweat_smile: did you make use of an uninhabited type by chance?

Chris Hughes (Aug 13 2019 at 12:15):

Yes

Kevin Kappelmann (Aug 13 2019 at 12:15):

Argh, types in Isabelle are always inhabited, which is why it makes more sense there.

Kevin Buzzard (Aug 13 2019 at 12:17):

rofl

Kevin Buzzard (Aug 13 2019 at 12:18):

I remember having this conversation with Manuel Eberl in Big Proof 2. I told him "We defined perfectoid spaces in Lean!" and he said "that sounds pretty simple, all you do is make a structure and prove that it's inhabited" and I said "oh gosh we're nowhere near proving that it's inhabited"

Kevin Buzzard (Aug 13 2019 at 12:19):

[since then we have proved that the empty space is perfectoid]

Kevin Kappelmann (Aug 13 2019 at 12:20):

I'll talk with @maximilian p.l. haslbeck and let you know if we change it to inhabited types only or change the task to disprove both. Nice work Chris :')

Chris Hughes (Aug 13 2019 at 12:27):

You don't actually even need to use uninhabited types to disprove both. prove is false with α = β = γ = bool and disprove is false with α = β = γ = unit

Kevin Kappelmann (Aug 13 2019 at 12:36):

I see, the Isabelle and Coq versions just do α = string, β = nat, γ = Prop, which isn't quite clear from the text. I guess we change it to that in Lean then as well.

Kevin Kappelmann (Aug 13 2019 at 12:48):

I hope this: lean_variables.zip should make sense now @Chris Hughes (I need to wait for Max to change it online).

Chris Hughes (Aug 13 2019 at 13:12):

Managed that one:+1:

Kevin Kappelmann (Aug 13 2019 at 13:15):

Legend! Sorry for the inconvenience - you are really really helpful! :) Again, your submission will re-run when the definition is fixed online.

maximilian p.l. haslbeck (Aug 13 2019 at 14:42):

Legend! Sorry for the inconvenience - you are really really helpful! :) Again, your submission will re-run when the definition is fixed online.

done that

Simon Hudon (Aug 13 2019 at 22:33):

I just submitted a solution which failed because "Illegal keyword "notation"". Are we allowed local notation?

Alex J. Best (Aug 13 2019 at 22:36):

Yes local should work I think

Simon Hudon (Aug 13 2019 at 22:42):

That's weird. It's still rejecting it.

Simon Hudon (Aug 13 2019 at 22:43):

... and I just noticed local notation `|` t `|` := fintype.card t at the beginning of the template file. Doubly weird

Alex J. Best (Aug 13 2019 at 22:56):

https://github.com/maxhaslbeck/proving-contest-backends/commit/915ab5e7cd1c6a3fd67b4ebc634ad5361b935d5b is where the code was added. Do you see anything in the regex there that explains it?

Simon Hudon (Aug 13 2019 at 23:00):

It looks like the comment mentioning notation is what is tripping it up. I just deleted that comment to check

Simon Hudon (Aug 13 2019 at 23:00):

Yuuup

Simon Hudon (Aug 13 2019 at 23:00):

The easiest fix would be to delete that comment from the template

Kevin Kappelmann (Aug 14 2019 at 08:04):

The easiest fix would be to delete that comment from the template

Thanks, we will remove it! Sorry for the trouble and congrats for being first to solve the problem! :)

Kevin Kappelmann (Aug 14 2019 at 08:09):

also congrats to @Alex J. Best for breaking the system (again!) :fireworks:

Abhimanyu Pallavi Sudhir (Aug 14 2019 at 08:15):

set_option pp.notation should be allowed, right?

Chris Hughes (Aug 14 2019 at 08:16):

You can always delete that before submission without breaking proofs.

Kevin Kappelmann (Aug 14 2019 at 08:21):

set_option pp.notation should be allowed, right?

It would be rejected at the moment. You can find the regex we use here. If someone has suggestions to improve this check (or anything else), let me know :)

Andrew Ashworth (Aug 14 2019 at 08:42):

I don't quite understand why checking is done via regex? You could call lean itself at trust level zero to check any proofs. How are things implemented on the back end?

Abhimanyu Pallavi Sudhir (Aug 14 2019 at 08:47):

set_option pp.notation should be allowed, right?

It would be rejected at the moment. You can find the regex we use here. If someone has suggestions to improve this check (or anything else), let me know :)

Just change the RegEx to ^notation?

Kevin Kappelmann (Aug 14 2019 at 08:50):

I don't quite understand why checking is done via regex? You could call lean itself at trust level zero to check any proofs. How are things implemented on the back end?

The compile/check commands are here. Setting the trust level to 0 is described as do not trust any macro and type check all imported modules. I do not know what is meant by "macro" in that setting. Also, would that mean that all imported mathlib files are type-checked again?

Kevin Kappelmann (Aug 14 2019 at 08:52):

set_option pp.notation should be allowed, right?

It would be rejected at the moment. You can find the regex we use here. If someone has suggestions to improve this check (or anything else), let me know :)

Just change the RegEx to ^notation?

you could then just add, for example, a whitespace, i.e. notation ... := ..., and it would succeed.

Abhimanyu Pallavi Sudhir (Aug 14 2019 at 08:54):

^\s*notation?

Chris Hughes (Aug 14 2019 at 08:56):

I don't quite understand why checking is done via regex? You could call lean itself at trust level zero to check any proofs. How are things implemented on the back end?

The compile/check commands are here. Setting the trust level to 0 is described as do not trust any macro and type check all imported modules. I do not know what is meant by "macro" in that setting. Also, would that mean that all imported mathlib files are type-checked again?

I think yes. The default trust setting is max. This could be changed on the mathlib binaries.

Kevin Kappelmann (Aug 14 2019 at 08:59):

^\s*notation?

That just solves the whitespace case. You can also have things like #print "Let's cheat" notation 'false' := true

Kevin Kappelmann (Aug 14 2019 at 09:01):

I think yes. The default trust setting is max. This could be changed on the mathlib binaries.

Hmm, I do not know how to tweak the compile process such that mathlib is trusted and the submission file is not. Are there any references where I can learn how to do that?

Sebastian Ullrich (Aug 14 2019 at 09:05):

I don't think trust 0 gets you anything if you compile the submission yourself anyway. On the other hand, to be completely safe you should probably put the theorem statement in a separate file that doesn't import the submission. Then you can do something like

import .statement
import .submission

theorem yesitstrue : _root_.thestatement := theproof

Kevin Kappelmann (Aug 14 2019 at 09:10):

I don't think trust 0 gets you anything if you compile the submission yourself anyway. On the other hand, to be completely safe you should probably put the theorem statement in a separate file that doesn't import the submission. Then you can do something like

import .statement
import .submission

theorem yesitstrue : _root_.thestatement := theproof

Ah, that's a good idea to solve the notation issue - thanks! :)

Sebastian Ullrich (Aug 14 2019 at 09:23):

Though I guess someone could still add a notation for _root_.thestatement, haha. So you should beat them to it and define your own notation:
statement.lean

def statement := ...
notation `statement` := statement

check.lean

import .statement
import .submission

theorem yesitstrue : statement := theproof

Now a malicious actor can at most overload statement, which will fail elaboration. At least that's the last trick I can think of.

Rob Lewis (Aug 14 2019 at 09:29):

Or you could do the test with a metaprogram. Write a tactic that retrieves the declaration named thestatement and check that the type of theproof unifies with the body of that declaration.

Sebastian Ullrich (Aug 14 2019 at 09:30):

You still have to call the tactic somewhere. There will be at least one identifier exposed to the attacker, so wrapping it in a notation is the best I can come up with

Rob Lewis (Aug 14 2019 at 09:33):

You can inline the tactic in a run_cmd, no?

Rob Lewis (Aug 14 2019 at 09:34):

Or even define the tactic in statement.lean and just call it with a run_cmd in check.lean.

Sebastian Ullrich (Aug 14 2019 at 09:35):

Yes, and then I'll shadow your tactic call in check.lean with a notation for tactic.skip in my submission :)

Rob Lewis (Aug 14 2019 at 09:39):

Ah, yeah, of course.

Rob Lewis (Aug 14 2019 at 09:42):

I did a low tech version of this for checking homework assignments in my course this spring. It definitely wasn't secure, but I figured if any students learned enough about Lean to trick it, they deserved the points anyway.

Kevin Buzzard (Aug 14 2019 at 10:50):

Do you do anything about the trick Gabriel Ebner raised whereby a malicious piece of Lean code can delete all the files on your computer?

Kevin Buzzard (Aug 14 2019 at 10:50):

I think it was once a Lean issue?

Rob Lewis (Aug 14 2019 at 11:00):

I remember having that discussion at Big Proof 1. Any programming language can do this, but Lean is kind of special in that code gets executed as soon as you open it in the editor.

Rob Lewis (Aug 14 2019 at 11:00):

https://github.com/leanprover/lean/issues/1781

Mario Carneiro (Aug 14 2019 at 15:10):

The regex looks very fragile to me. I think this is still possible:

namespace tactic.interactive
open lean.parser
@[user_command] meta def evil_cmd (_ : interactive.parse $ tk "evil") : lean.parser unit :=
with_input command_like ("nota" ++ "tion `false` := 0 = 0") >> return ()
end tactic.interactive
evil

theorem contradiction : false := rfl

Kevin Kappelmann (Aug 14 2019 at 15:17):

I'd say, just try it out! and if it works, please hide the solution :')

Sebastian Ullrich (Aug 14 2019 at 15:46):

I had forgotten that we allow whitespace inside tokens... to be honest, there probably is no simpler bullet-proof solution than parsing the export format. If you put the theorem statement in a separate import, you "just" have to check that the type of the exported definition is a constant expression of the statement type's name.

Alex J. Best (Aug 14 2019 at 15:50):

Yeah maybe that is easier, from my experiments yesterday breaking it (sorry @Kevin Kappelmann ), regex is hard to get perfect and esp. with lean's syntactic complexity ruling out all the edge cases sounds like it might take a little while, maybe the internal format is the way to go.

Floris van Doorn (Aug 14 2019 at 19:01):

I've been also thinking how to make this robust. Maybe this works:
As @Sebastian Ullrich suggested have a separate file statement.lean, which contains the statement, but does not import submission.lean. To make it robust, how about we generate a random name with 16 random letters (different for every time someone submits), and then:

statement.lean

def <random_name> := false

check.lean

import .statement .submission

theorem yesitstrue : <random_name> := soundness_bug

Does anyone know an attack against that?

Rob Lewis (Aug 14 2019 at 19:53):

.submission could have a metaprogram that reads the file statement.lean and tries to parse the name.

Kevin Kappelmann (Aug 14 2019 at 19:59):

If I understand correctly, in Isabelle, they save the statement context in a local file and then load that file in .submission - see here page 4

Kevin Buzzard (Aug 14 2019 at 20:04):

This thread should be renamed "cheating for fun"

Kevin Kappelmann (Aug 14 2019 at 20:08):

Sebastian's proposal seems to be the easiest, most reliant option so far to me.

Kevin Kappelmann (Aug 14 2019 at 20:10):

btw, we got a Lean winner! :muscle: congrats @Simon Hudon

Simon Hudon (Aug 14 2019 at 20:11):

Thanks :)

Floris van Doorn (Aug 16 2019 at 21:12):

I want to PR the lemmas I used in these exercises to mathlib, but I also don't want to give hints to other provers... I guess I'll wait till mid-September.

Simon Hudon (Aug 16 2019 at 21:29):

You can try giving the proof by trust_me

Floris van Doorn (Aug 17 2019 at 00:32):

(deleted - that moment when "Reply" doesn't reply to the bottom message in your stream :P)

David Michael Roberts (Aug 18 2019 at 02:15):

[since then we have proved that the empty space is perfectoid]

What's the terminal perfectoid space, if there is one? Some analogue of Spec(Z)?

Kevin Buzzard (Aug 18 2019 at 14:18):

There won't be one. There are perfectoid fields of char 0 and perfectoid fields of char p, maybe there is some weird pathological object. But it's not the right question. People work with perfectoid spaces over a fixed perfectoid field and over that field you just look at the affinoid space attached to that field. The problem is that we have no examples of perfectoid fields yet even though they're do-able -- we just need more algebra and field theory in Lean and more people

Johan Commelin (Aug 24 2019 at 15:03):

@David Michael Roberts I think that nowadays the answer is: perfectoid spaces embed into the category of diamonds. And there is no terminal diamond, which is a feature. Because now you can look at Spa Q_p × Spa Q_p and it will not be Spa Q_p. (Philosophically, this means that we now think that we know how to take p-adic fibre products over F1\mathbb{F}_1.)

Kevin Kappelmann (Sep 05 2019 at 10:25):

Given the following file test.lean

axiom «quot.sound : a» : false
theorem broken : false := «quot.sound : a»

When I run

lean test.lean -E check.out --json --only-export=broken
leanchecker check.out broken

the output of leanchecker is

axiom quot.sound : false
theorem broken : false
checked 12 declarations

I would have expected to have the first line say something like axiom «quot.sound : a» : false. Is this a bug or working as expected? @Alex J. Best came across this while checking the fix I committed for the proving for fun competition. We are detecting non-built-in axioms, e.g. everything besides propext, classical.choice, quot.sound, by parsing the output of leanchecker. In this case, we would not be able to detect any cheat.

Sebastian Ullrich (Sep 05 2019 at 10:36):

Now I'm even more convinced that you should parse the export file yourself :)

Alex J. Best (Sep 05 2019 at 14:25):

Maybe one alternative is to use https://github.com/gebner/trepplein I just tried it and with --print-all-decls I see axiom quot.sound : a : false for the above example.

Alex J. Best (Sep 05 2019 at 14:25):

Or better yet, run both!

Keeley Hoek (Sep 05 2019 at 14:34):

right, and then you could just prohibit the declaration of any extra axioms?

Alex J. Best (Sep 05 2019 at 14:35):

Yes, trepplein helpfully prints /- builtin -/ before any builtin axioms so that seems easier in this case.

Kevin Kappelmann (Sep 05 2019 at 16:02):

I had a brief look at trepplein, but I do not want to install scalar, sbt, etc. as well when setting up the backend (installing trepplein actually failed for me at first because there is a bug with JDK10 and optimisations). I just included the type of the axiom in the check for now

Kevin Kappelmann (Sep 05 2019 at 16:10):

Thanks for all the help, that is breaking my code multiple times @Alex J. Best

Alex J. Best (Sep 05 2019 at 16:13):

Thanks for putting up with me! It looks pretty robust now.

Kevin Buzzard (Sep 11 2019 at 20:24):

Hey what is happening with Proving for Fun this month? The current competition is about to end. Will the new one support Lean? @Kevin Kappelmann do you know about this stuff?

Kevin Buzzard (Sep 11 2019 at 20:25):

I've been trying the August 2019 problems and some are infuriating.

Kevin Buzzard (Sep 11 2019 at 21:29):

Am I allowed to import stuff which isn't imported yet? I don't have use in the PCP problem.

Alex J. Best (Sep 11 2019 at 22:11):

Yes you should be able to import anything from mathlib

Andrew Ashworth (Sep 12 2019 at 00:05):

do solutions for these problem get released at the end of the month? I'm interested in comparing across provers

Alex J. Best (Sep 12 2019 at 00:29):

Yes: see https://github.com/maxhaslbeck/ProvingForFun-July2019 for July's. Also there is another solution in the "issues" tab there that is not saved as a file.

Alex J. Best (Sep 12 2019 at 00:30):

There are no lean ones on there yet though, just one of @Chris Hughes on the issues page

Kevin Buzzard (Sep 12 2019 at 04:20):

Maybe this will change on Saturday :-)

Kevin Kappelmann (Sep 12 2019 at 08:15):

Hey what is happening with Proving for Fun this month? The current competition is about to end. Will the new one support Lean? Kevin Kappelmann do you know about this stuff?

@Kevin Buzzard There is in fact a whole workshop including a new competition tomorrow at ITP. You can again solve all problems in Lean, Isabelle, or Coq. Moreover, you can discuss, for example, how to organise, improve, and make use of proving contests to help spark interest for the ITP community at the workshop :) (I'm sadly not at ITP, but @maximilian p.l. haslbeck will be)

Kevin Kappelmann (Sep 12 2019 at 08:18):

do solutions for these problem get released at the end of the month? I'm interested in comparing across provers

Yes, solutions will be released at the end of the month. If @Chris Hughes does not mind, I can provide his Lean solutions in the July repository. At the end of the August competition, I suppose that Max will create a similar repository including sample solutions.

Chris Hughes (Sep 12 2019 at 08:20):

I don't mind. They're not necessarily particularly elegant solutions.

Kevin Kappelmann (Sep 12 2019 at 08:48):

Yes: see https://github.com/maxhaslbeck/ProvingForFun-July2019 for July's. Also there is another solution in the "issues" tab there that is not saved as a file.

I uploaded the solutions for the two problems that were formalised in Lean now; they are quite simple though. The ones for August gonna be more interesting.

Kevin Buzzard (Sep 12 2019 at 23:27):

I just finished PCP but looks like I missed the deadline :D I got caught out by the fact that I'm not in the right time zone ;-)

Andrew Ashworth (Sep 12 2019 at 23:37):

Thank you for uploading the solutions. I feel like having a collection of worked examples in multiple languages is very educational. Maybe we need a Rosetta Code for theorem proving... http://rosettacode.org

Andrew Ashworth (Sep 12 2019 at 23:41):

Sort of like Freek's 100 problems list but more general

Kevin Buzzard (Sep 12 2019 at 23:43):

Thanks for adding the Lean formalisations of the July questions! Can I somehow upload my solution to PCP (August) even though the deadline is passed? Or have I missed my chance?

Kevin Buzzard (Sep 12 2019 at 23:46):

By the way, when I download the zip files, I always have to rename the names of the Lean files, e.g. from Defs.lean to defs.lean (or else the import fails). I'm on Ubuntu.

maximilian p.l. haslbeck (Sep 13 2019 at 05:44):

I just finished PCP but looks like I missed the deadline :D I got caught out by the fact that I'm not in the right time zone ;-)

The PCP task is now in the "Alltime Contest", fell free to submit your solution there.

Kevin Kappelmann (Sep 13 2019 at 09:18):

By the way, when I download the zip files, I always have to rename the names of the Lean files, e.g. from Defs.lean to defs.lean (or else the import fails). I'm on Ubuntu.

That's very true and inconvenient - I created an issue on GitHub

maximilian p.l. haslbeck (Sep 13 2019 at 15:52):

Today we will have the Proof Ground workshop here in Portland. We will have two contests, one before lunch (10:30 - 12:30 (PDT)) and one after lunch (14:00 - 15:30 (PDT)). Feel free to participate from anywhere on earth and have fun by solving and mechanizing the tasks :).

maximilian p.l. haslbeck (Sep 13 2019 at 17:31):

The morning session is starting now. You can see the problems here and submit solutions: https://competition.isabelle.systems/competitions/contest/13/

maximilian p.l. haslbeck (Sep 13 2019 at 21:18):

The afternoon session has started. https://competition.isabelle.systems/competitions/contest/14/

Andrew Ashworth (Sep 14 2019 at 00:13):

Looks like sledgehammer_squad completed every problem! How much of their proofs was via sledgehammer as opposed to human reasoning?

Simon Hudon (Sep 14 2019 at 00:16):

They were clever in how they broke down the problem between the two of them and sledgehammer allowed them to skirt detailed proofs

Andrew Ashworth (Sep 14 2019 at 00:20):

Do you know which part of sledgehammer was the most effective?

Andrew Ashworth (Sep 14 2019 at 00:21):

as in was their a particular solver it called out to that solved most of their intermediate lemmas

Scott Morrison (Sep 14 2019 at 00:28):

What the business with

-- the goal used in submission
notation `GOAL` :=

in the Lean template. Am I meant to be confused by this?

Manuel Eberl (Sep 14 2019 at 00:31):

I can tell you that I at least didn't use sledgehammer very much. Occasionally when I was too lazy to figure out what lemmas I need for some obvious fact. But I don't use it that much because it breaks my flow if I have to wait for it to finish.

Manuel Eberl (Sep 14 2019 at 00:32):

We only worked on one problem in parallel, and there sledgehammer did probably save us a few minutes because it made the "integration" of my lemmas and Peter's proof that wanted to use them easier.

Manuel Eberl (Sep 14 2019 at 00:34):

In any case, the difficult part is figuring out what auxiliary lemmas you need, what kind of induction pattern to use, and then you sometimes have to do ugly case distinctions. Sledgehammer doesn't help with any of that. So the role of sledgehammer in our work today really shouldn't be overestimated.

Simon Hudon (Sep 14 2019 at 00:40):

Thanks for chiming in @Manuel Eberl. It is quite impressive what a head start you two had. Any idea what was most determinant in your performance?

Andrew Ashworth (Sep 14 2019 at 00:44):

oh - is it weird that I'm actually a little disappointed? I was hoping there was going to be "that one secret tactic that computer scientists don't want you to know about" that would solve my proofs. But, like always, the solution is: think really hard and be a smarter human, which is difficult work as you know, haha

Manuel Eberl (Sep 14 2019 at 00:48):

Thanks for chiming in Manuel Eberl. It is quite impressive what a head start you two had. Any idea what was most determinant in your performance?

Hard to say. The automation probably helped a lot. Also I have 8 years of experience with the system, and Peter even more.

Manuel Eberl (Sep 14 2019 at 00:49):

I think the real question is: What obstacles did the other teams encounter?

Manuel Eberl (Sep 14 2019 at 00:50):

The grammar problem and the "fold" one were a bit tricky, but the others were fairly straightforward. So I think the best strategy is to figure out where the other teams go stuck with those simple problems and why.

Manuel Eberl (Sep 14 2019 at 00:51):

Floris's team was only a few minutes behind us in the first session. I'll ask him what the problem was in the second session. I'm curious myself.

Simon Hudon (Sep 14 2019 at 02:54):

Speaking for Floris' team, in the second round, we didn't split the workload smartly and when we got stuck we didn't switch problems. Some simple ideas also escaped us.

Mario Carneiro (Sep 14 2019 at 09:01):

@Floris van Doorn Here's the "slick" proof I was thinking of for the balanced words challenge:

import data.list.basic data.nat.basic

inductive balanced : list bool  Prop
| empty : balanced []
| app {x y} : balanced x  balanced y  balanced (x ++ y)
| tf {l} : balanced l  balanced (tt :: l ++ [ff])
| ft {l} : balanced l  balanced (ff :: l ++ [tt])

theorem balanced.mk' :  b {l}, balanced l  balanced (bnot b :: l ++ [b])
| tt _ := balanced.ft
| ff _ := balanced.tf

inductive unbalanced (b : bool) : list bool    Prop
| bal {} {l₁ l₂} : balanced l₁  balanced l₂  unbalanced (l₁ ++ b :: l₂) 0
| up {} {l₁ l₂ n} : balanced l₁  unbalanced l₂ n  unbalanced (l₁ ++ b :: l₂) (n.succ)

theorem unbalanced.app {b l₁ l₂ n} (h₁ : balanced l₁) (h₂ : unbalanced b l₂ n) :
  unbalanced b (l₁ ++ l₂) n :=
begin
  induction h₂ with l₂ l₃ h₂ h₃ l₂ l₃ n h₂ h₃ IH; rw  list.append_assoc,
  { exact unbalanced.bal (h₁.app h₂) h₃ },
  { exact unbalanced.up (h₁.app h₂) h₃ }
end

inductive tipsy (l : list bool) :     Prop
| left (m k) : unbalanced tt l k  tipsy (m + k).succ m
| right (m k) : unbalanced ff l k  tipsy m (m + k).succ
| bal {m} : balanced l  tipsy m m

theorem lem1 :  {l m n}, m = n  tipsy l m n  balanced l
| l _ _ _ (tipsy.bal h) := h
| l _ _ e (tipsy.left m k _) := (ne_of_gt (nat.lt_succ_of_le (nat.le_add_right _ _)) e).elim
| l _ _ e (tipsy.right m k _) := (ne_of_lt (nat.lt_succ_of_le (nat.le_add_right _ _)) e).elim

theorem lem2a {b l n} (h : unbalanced b l n) :
  (nat.cases_on n (balanced (bnot b :: l)) (unbalanced b (bnot b :: l)) : Prop) :=
begin
  cases h with l₁ l₂ h₁ h₂ l₁ l₂ n h₁ h₂,
  { simpa using (h₁.mk' b).app h₂ },
  { simpa using h₂.app (h₁.mk' b) },
end

theorem lem2b {b l} (h : balanced l) : unbalanced b (b :: l) 0 :=
unbalanced.bal balanced.empty h

theorem lem2c {b l n} (h : unbalanced b l n) : unbalanced b (b :: l) (n.succ) :=
unbalanced.up balanced.empty h

theorem lem2 {l} : tipsy l (l.count tt) (l.count ff) :=
begin
  induction l with b l IH, {constructor, constructor},
  cases b, all_goals {
    simp [-add_comm],
    generalize_hyp : l.count tt = m at IH ,
    generalize_hyp : l.count ff = n at IH ,
    cases IH },
  { cases IH_k,
    { constructor, exact lem2a IH_a },
    { rw [ nat.succ_add], constructor, exact lem2a IH_a } },
  { rw [ nat.add_succ], constructor, exact lem2c IH_a },
  { exact tipsy.right _ 0 (lem2b IH_a) },
  { rw [ nat.add_succ], constructor, exact lem2c IH_a },
  { cases IH_k,
    { constructor, exact lem2a IH_a },
    { rw [ nat.succ_add], constructor, exact lem2a IH_a } },
  { exact tipsy.left _ 0 (lem2b IH_a) }
end

theorem goal {l} : balanced l  l.count tt = l.count ff :=
⟨λ h, by induction h; simp [*, -add_comm], λ h, lem1 h lem2

Kevin Kappelmann (Sep 14 2019 at 09:19):

Hope you guys had fun and maybe discussed some ideas how to further extend and make use of the competitions system!

What the business with

-- the goal used in submission
notation `GOAL` :=

in the Lean template. Am I meant to be confused by this?

This was done to prevent the "notation cheat". See this post.

Scott Morrison (Sep 14 2019 at 10:18):

Could we maybe have a comment to the effect that it's irrelevant implementation garbage, that someone interested in doing the problem should ignore?

Scott Morrison (Sep 14 2019 at 11:32):

(deleted)

Floris van Doorn (Sep 14 2019 at 18:04):

@Mario Carneiro That's a slick proof indeed. Very nice!

Kevin Buzzard (Sep 18 2019 at 16:41):

@Manuel Eberl posted a link to his proofs for Proof Ground 2019 on Twitter. I'd like to have a discussion about some of those problems, and in particular why some of them might be harder in Lean than in Isabelle. But I keep not starting it because I've not finished all the questions yet :-)

Mario Carneiro (Sep 18 2019 at 19:36):

I haven't seen these proofs yet but I think you probably will get only a weak signal regarding overall ease of use of the library from a 3 hour competition. You are far more likely to see noise relating to how well the competitor knows the library and the right proof strategy for the competition problems

Simon Hudon (Sep 18 2019 at 19:39):

That's also my opinion

Alexandre Rademaker (Nov 02 2019 at 21:12):

In the experiments I am doing with large theories, I believe it would be nice to have something similar to Sledgehammer in Lean. The proof https://github.com/arademaker/sumo/blob/lists/bs.lean#L178-L210 is just a sequence of uses of the same 2-3 axioms to navigate in the SUMO hierarchy.

Floris van Doorn (Feb 17 2020 at 20:09):

The discussion about Codewars sparked me to try to break the system again. It is still possible: https://github.com/maxhaslbeck/proving-contest-backends/issues/27

Kevin Kappelmann (Feb 17 2020 at 22:14):

War. War never changes...

Alexandre Rademaker (Jul 17 2020 at 18:21):

Andrew Ashworth said:

Thank you for uploading the solutions. I feel like having a collection of worked examples in multiple languages is very educational. Maybe we need a Rosetta Code for theorem proving... http://rosettacode.org

My first contribution. I have also created a page for Lean. See

http://rosettacode.org/wiki/Fibonacci_sequence#Lean

Alexandre Rademaker (Jul 18 2020 at 22:33):

The competitors are taking their turn... https://rosettacode.org/wiki/Category:Isabelle. ;-)

Kevin Buzzard (Jul 19 2020 at 10:02):

I don't think of these other systems as competitors. I believe that early lean development was hugely inspired by Isabelle. I think the interesting question is what does and doesn't work in the systems (or more precisely how well various parts of methods work in the systems). Isabelle has demonstrated that complex analysis works great there. I am unclear about whether there is a serious Coq development but we have essentially nothing in Lean. Perhaps if the Isabelle people explain how to do it, we can learn!

Alexandre Rademaker (Jul 22 2020 at 22:34):

Hi @Kevin Buzzard surely. I was just making some fun. I completely agree that we can and must learn from each other. Regarding the rosettacode.org, I like the idea of the webiste. Having the same problem encoded in many different tools/language help us understand the differences. That is why I am motivating people to contribute with more Lean code. Next I would like to try the sort algorithms that already have Isabelle code there.


Last updated: Dec 20 2023 at 11:08 UTC