Zulip Chat Archive

Stream: IMO-grand-challenge

Topic: 1962 problem 1


Kevin Lacker (Oct 05 2020 at 20:30):

i got another formalization pull request out. https://github.com/leanprover-community/mathlib/pull/4450
it's interesting, i feel like i spend more effort on formalizing the parts that are "trivial" to a human mathematician, rather than the parts that are "difficult" to a human mathematician.

Scott Morrison (Oct 05 2020 at 22:05):

Did norm_digits get PR'd? If not it probably should be --- someone other that Mario can make the PR.

Kevin Lacker (Oct 05 2020 at 22:09):

i dont feel great about PRing in a big blob of code i dont understand. but i would love for someone else to

Kevin Lacker (Oct 05 2020 at 22:11):

actually, @Shing Tak Lam commented on the other pull request with a one-liner with rw that does the same thing as digit_recursion

Kevin Lacker (Oct 05 2020 at 22:11):

so... maybe there already is a way to do it in the existing library, it is just a complicated one

Scott Morrison (Oct 05 2020 at 22:16):

it's interesting, i feel like i spend more effort on formalizing the parts that are "trivial" to a human mathematician, rather than the parts that are "difficult" to a human mathematician.

Do you have a feeling more tactics would help? (e.g. things like interval_cases)

Shing Tak Lam (Oct 05 2020 at 22:17):

Kevin Lacker said:

actually, Shing Tak Lam commented on the other pull request with a one-liner with rw that does the same thing as digit_recursion

Yes, but something like digit_recursion should still be in the library imo. I think digits_aux should really be treated as an implementation detail and the users should really only need digits.

Kevin Lacker (Oct 05 2020 at 22:18):

that makes sense - i ended up not being able to figure out how to use the digits aux stuff, and so i just wrote digits_helper because that was the form i wished the lemmas were in

Kevin Lacker (Oct 05 2020 at 22:18):

also because i was sick of proving 10 > 2 so many times

Kevin Lacker (Oct 06 2020 at 15:40):

Scott Morrison said:

it's interesting, i feel like i spend more effort on formalizing the parts that are "trivial" to a human mathematician, rather than the parts that are "difficult" to a human mathematician.

Do you have a feeling more tactics would help? (e.g. things like interval_cases)

I wish there were fewer tactics, that worked in more situations. Tactics like linarith, simp, ring, library_search, norm_num, nlinarith are all convenient at times, but when I pick between them I don't feel like I'm doing advanced mathematics, I feel like that process could be automated. Perhaps the core of the library just needs to be 10x faster and more along these lines will be possible.

Mario Carneiro (Oct 06 2020 at 15:47):

that sounds like you are describing tidy

Kevin Lacker (Oct 06 2020 at 15:56):

i wasn't even aware of tidy! i guess that goes back to the problem of, there are so many different tactics available, just becoming aware of all relevant tactics and understanding what they do is most of the effort

Mario Carneiro (Oct 06 2020 at 16:06):

tidy is a kitchen sink tactic written by Scott when he got fed up with exactly what you are describing

Mario Carneiro (Oct 06 2020 at 16:06):

it will call simp and ring and linarith and all the others

Heather Macbeth (Oct 06 2020 at 16:25):

It's also somehow just really painful to do things with actual numbers in Lean. I usually just give up, so I think it's great that you are pushing through to get us some examples of these terrible, painful proofs. Hopefully specific examples will provide the data needed for new lemmas and more automation!

Kevin Buzzard (Oct 06 2020 at 16:27):

Kevin have you gone through the problem sheets from Lean for the Curious Mathematician? These were specifically designed to show people how to do lots of different kinds of mathematics in Lean. There are accompanying videos which talk you through the sheets.

Heather Macbeth (Oct 06 2020 at 16:33):

Here's a problem with actual numbers which I put on a real analysis problem set a few weeks ago. Easy on paper; most of my students did it without difficulty. I tried to do it in Lean but abandoned the effort when it became clear how ugly a complete solution would look.

def converges_to (x :   ) (a : ) : Prop :=  ε > 0,  M : ,  n  M, |x n - a| < ε

example : converges_to (λ n, n / (7 * n - 1)) (1 / 7) := sorry

Kevin Lacker (Oct 06 2020 at 16:43):

yeah, similarly that huge thread about why linarith couldn't prove 123456 < 1000000

Kevin Lacker (Oct 06 2020 at 16:45):

your converges_to example is interesting... it feels like you'd want a ton of helper lemmas

Kevin Lacker (Oct 06 2020 at 16:48):

but yeah i'd have to look through 100's of existing lemmas. so far i have only looked at the trivial arithmetic type lemmas, trying to prove things like (39 * c + 24) % 39 = 24

Heather Macbeth (Oct 06 2020 at 16:48):

It depends what you mean by helper lemmas. I specifically wanted the students to prove it using "by hand" epsilon-delta analysis, rather than by invoking theorems about limits of addition, division, etc. So I wanted a Lean proof that did this, too.

Kevin Lacker (Oct 06 2020 at 16:49):

i see. but you end up with something like, you have two fractions of polynomials and you want to prove the inequality and it's just super cumbersome

Heather Macbeth (Oct 06 2020 at 16:50):

Yeah, it was terrible. Here's where I was when I abandoned ship :sad:

local notation `|`x`|` := abs x

example : converges_to (λ n, n / (7 * n - 1)) (1 / 7) :=
begin
  intros ε ,
  obtain M, hM :  (M : ), M > max (1 / (41 * ε)) 1 := exists_nat_gt _,
  use M,
  intros n hn,
  have : max (1 / (41 * ε)) 1 < n,
  { have : (M:)  n,
    { norm_cast,
      assumption },
    linarith },
  have h_denom : (7:) * n - 1  6 * n,
  { have h_max_right : max (1 / (41 * ε)) 1  1 := le_max_right _ _, --(1 / (41 * ε)) 1,
    linarith },
  have h_denom_nonzero : (7:) * n - 1  0 := by linarith,
  have h_7 : (7:)  0 := by linarith,
  have h : n / (7 * n - 1) - (1:) / 7 = 7⁻¹ * (7 * n - 1)⁻¹,
  { field_simp [h_denom_nonzero, h_7],
    ring },
  have h_denom_abs : abs ((7:) * n - 1) = (7:) * n - 1 := abs_of_nonneg (by linarith),
  have h_7_abs : abs (7:) = 7 := abs_of_nonneg (by linarith),
  simp only [abs_mul, abs_inv, *],
  suffices : 7 * 7⁻¹ * (7 * n - 1)⁻¹ < 7 * ε * (7 * n - 1)⁻¹ * (7 * n - 1),
  { have : (0:) < 7 := by norm_num,
    sorry },
  simp [h_7],
  have h_max_left : max (1 / (41 * ε)) 1  (1 / (41 * ε)) := le_max_left _ _,
  have : (1 / (41 * ε))  n := by linarith,
  sorry
end

Kevin Lacker (Oct 06 2020 at 16:50):

@Mario Carneiro tidy doesn't work on (39 * c + 24) % 39 = 24 by the way. also, if I solve something using tidy, can I see what lemmas it used?

Heather Macbeth (Oct 06 2020 at 16:51):

Kevin Lacker said:

also, if I solve something using tidy, can I see what lemmas it used?

tidy?

Kevin Lacker (Oct 06 2020 at 16:54):

see previously in this thread, where mario was saying i should try using tidy

Heather Macbeth (Oct 06 2020 at 16:56):

No, I mean actually type tidy? to query tidy! See docs#tactic.tidy.

Kevin Lacker (Oct 06 2020 at 16:56):

hmm, i can't cut and paste your code, because it doesn't recognize the |

Kevin Lacker (Oct 06 2020 at 16:56):

oh, i see. lol

Heather Macbeth (Oct 06 2020 at 16:57):

Kevin Lacker said:

hmm, i can't cut and paste your code, because it doesn't recognize the |

I'll edit above to include the missing notation def, sorry!

Kevin Lacker (Oct 06 2020 at 16:57):

is a normal | supposed to be absolute value? or do you have to to \|

Kevin Lacker (Oct 06 2020 at 16:58):

oh just abs

Kevin Lacker (Oct 06 2020 at 16:58):

ok

Kevin Lacker (Oct 06 2020 at 17:03):

yeah i tried doing this differently and ran into trouble quickly. intuitively i would just be like, any big number should work, pick some random polynomial in (1/epsilon). but actually you need to map between nats and reals, and you need to handle the somewhat stupid case where epsilon is less than 1.

Heather Macbeth (Oct 06 2020 at 17:04):

Yep, all this. Also fractions are hard. Every time the denominator changes slightly you need a certificate that the new denominator is nonzero.

Kevin Lacker (Oct 06 2020 at 17:05):

something like Mathematica will solve this, right?

Kevin Lacker (Oct 06 2020 at 17:06):

it won't output a certificate, but it is probably only making valid steps, so it would be theoretically possible to do so

Heather Macbeth (Oct 06 2020 at 17:07):

I think Lean would solve it fairly fast, too, if I allowed myself to use, say, the whole docs#asymptotics.is_O library.

Heather Macbeth (Oct 06 2020 at 17:08):

But since the write-out-a-bunch-of-explicit-inequalities proof is human-trivial, I wish it were Lean-trivial too.

Heather Macbeth (Oct 06 2020 at 17:08):

And I mentioned it here because it reminded me of similar struggles I've seen in a couple of your IMO formalizations.

Kevin Lacker (Oct 06 2020 at 17:08):

yeah - that seems like a subproblem of the IMO challenge in a sense. if the "IMO grand challenge" is for an AI to solve IMO problems, consider a similar "IMO Trivial Challenge" where the question is, can the AI automatically solve everything that a human mathematician considers trivial

Kevin Lacker (Oct 06 2020 at 17:09):

what fraction of formalizing an IMO problem is formalizing part that a human would consider trivial? looking at the formalizations so far it's maybe... 80%?

Kevin Lacker (Oct 06 2020 at 17:09):

so if an AI strategy ends up being something like a tree search, perhaps 80% of the work trawling through this tree is actually handling the "trivial" parts

Kevin Lacker (Oct 06 2020 at 17:11):

a related question is training data - it's gonna be pretty hard to get enough training data for modern AI solutions for IMO problems, but it seems much more possible to generate a supply of trivial math problems, that are not currently trivially solvable for some Lean tactic...

Mario Carneiro (Oct 06 2020 at 17:40):

@Heather Macbeth Well I won't pretend it was easy, but I managed to prove your theorem:

example : converges_to (λ n, n / (7 * n - 1)) (1 / 7) :=
begin
  intros ε ,
  obtain M, hM :  (M : ), M > max (1 / (41 * ε)) 1 := exists_nat_gt _,
  use M,
  intros n hn,
  have hmn : (M:)  n, { norm_cast, assumption },
  obtain hn1, hn2 := max_lt_iff.1 (lt_of_lt_of_le hM hmn),
  replace hn1 : 1 / ε < 41 * n,
  { rw [div_lt_iff' ],
    rw [div_lt_iff' (mul_pos (by norm_num : (0:) < 41) )] at hn1,
    simpa [mul_comm, mul_assoc] using hn1 },
  have : 0 < (7 * n - 1 : ), { linarith },
  rw [div_sub_div _ _ (ne_of_gt this) (by norm_num : (7:)  0)],
  have : 0 < 7 * (7 * n - 1 : ) := mul_pos (by norm_num) this,
  simp [mul_comm, sub_sub_cancel],
  rw [abs_of_pos (inv_pos.2 this),  one_div, div_lt_iff this,  div_lt_iff' ],
  refine lt_of_lt_of_le hn1 _,
  linarith,
end

Heather Macbeth (Oct 06 2020 at 17:40):

Congratulations!

Mario Carneiro (Oct 06 2020 at 17:40):

I think the "right way" to prove this though is a way to reason about asymptotics

Mario Carneiro (Oct 06 2020 at 17:41):

rather than doing silly epsilon delta stuff

Heather Macbeth (Oct 06 2020 at 17:41):

Well, yes.

Heather Macbeth said:

I think Lean would solve it fairly fast, too, if I allowed myself to use, say, the whole docs#asymptotics.is_O library.

Heather Macbeth (Oct 06 2020 at 17:42):

But it would be nice if both ways are possible. Sometimes one needs to reason about inequalities among rational functions, without there being any asymptotics involved.

Mario Carneiro (Oct 06 2020 at 17:44):

Your point about pain around nonzero divisors is well taken. The field_simps tactic helps a lot with this general problem but it's still not easy

Mario Carneiro (Oct 06 2020 at 17:45):

The proof I gave above is I think a lot more "manual" than the one you did, with more explicit rewrite chains

Heather Macbeth (Oct 06 2020 at 17:45):

Heather Macbeth said:

But it would be nice if both ways are possible. Sometimes one needs to reason about inequalities among rational functions, without there being any asymptotics involved.

Here's a landmark paper from my own field that happens to require this kind of reasoning
https://arxiv.org/abs/0705.0710
(as one tiny human-trivial section in an article otherwise filled with difficult things)

Kevin Lacker (Oct 06 2020 at 17:47):

a lot of the nonzero stuff might be easier if the ability to prove trivial statements was a little better. e.g.

example (n : ) : 7 * n - 1  0 := by tidy

that doesn't work but I can imagine a world where it works.

Mario Carneiro (Oct 06 2020 at 17:47):

Heh, it amuses me to think that the hard part in the definition of the calabi functional on page 5 is the division

Mario Carneiro (Oct 06 2020 at 17:48):

@Kevin Lacker that statement is false

Kevin Lacker (Oct 06 2020 at 17:48):

really? it is not working for me right now

Mario Carneiro (Oct 06 2020 at 17:48):

which is one of many reasons why this is hard

Mario Carneiro (Oct 06 2020 at 17:48):

no, your theorem is false

Kevin Lacker (Oct 06 2020 at 17:48):

oh, because 0 - 1 is 0 in lean world

Kevin Lacker (Oct 06 2020 at 17:48):

ha

Kevin Lacker (Oct 06 2020 at 17:49):

example (n : ℕ) (h : n > 0) : 7 * n - 1 ≠ 0 := by tidy then

Mario Carneiro (Oct 06 2020 at 17:49):

linarith should be able to get that

Mario Carneiro (Oct 06 2020 at 17:49):

maybe not because you are still using nat subtraction

Kevin Lacker (Oct 06 2020 at 17:49):

it does not

Mario Carneiro (Oct 06 2020 at 17:50):

example (n : ) (h : n > 0) : (7 * n - 1 : )  0 := by linarith

works

Kevin Lacker (Oct 06 2020 at 17:51):

re: the hard part of the calabi function being division, it's like playing chess. the computer can easily play chess way better than any human, even those who studied for years. but to actually pick up a chess piece and put it down on a different square, that's hard, basically cutting edge robotics, although a 5-year-old has no trouble.

Mario Carneiro (Oct 06 2020 at 17:51):

I guess this version is true even without the hypothesis but that becomes more of a divisibility proof than a linear arithmetic

Mario Carneiro (Oct 06 2020 at 17:52):

It's definitely true that formalization takes the easy things and the hard things and completely turns them on their head

Kevin Lacker (Oct 06 2020 at 17:52):

here's another thing that would be nice to be trivial: example (n : ℕ) : n * n ≠ 3 := by tidy

Mario Carneiro (Oct 06 2020 at 17:53):

that could conceivably be dec_trivial

Mario Carneiro (Oct 06 2020 at 17:53):

but you need a decision procedure tailored for squares

Mario Carneiro (Oct 06 2020 at 17:54):

the hard part is selecting the right tool for the job

Kevin Lacker (Oct 06 2020 at 17:57):

i thought this might be easier but it still doesn't tidy: example (n : ℕ) : (n * n) % 4 ≠ 3 := by tidy i mean i have no idea what tidy is doing under the hood, so i dont blame it per se. that just makes it clearer to a human that you can do it by cases

Mario Carneiro (Oct 06 2020 at 17:57):

Kevin Lacker said:

Mario Carneiro tidy doesn't work on (39 * c + 24) % 39 = 24 by the way. also, if I solve something using tidy, can I see what lemmas it used?

The proof of (39 * c + 24) % 39 = 24 should be two steps: apply the theorem a < b -> (b * c + a) % b = a, and then norm_num

Mario Carneiro (Oct 06 2020 at 17:58):

I should remark that I'm not the tidy maintainer, and I would never want to be because any tactic that puts itself in a "solve every goal" situation is setting its users up for disappointment

Mario Carneiro (Oct 06 2020 at 18:00):

tidy literally does what I said: it applies simp and linarith and all the other tools in the box

Kevin Lacker (Oct 06 2020 at 18:00):

right, i just don't know what all the tools in the box are

Mario Carneiro (Oct 06 2020 at 18:01):

well, now you know that none of them solves this without advice

Mario Carneiro (Oct 06 2020 at 18:02):

There are some fancy tactics but most are fairly predictable, and none do magic

Mario Carneiro (Oct 06 2020 at 18:03):

Case bashing is a thing you can do but it's not done automatically because it's expensive

Mario Carneiro (Oct 06 2020 at 18:03):

for example, you could phrase your square theorem as something about (n : zmod 4) * n != 3 which I think you can prove by dec_trivial

Heather Macbeth (Oct 06 2020 at 18:05):

Mario, while we're on this subject, can you cast an eye over Kevin's PR #4366 to check whether the case-bashing is done with maximal efficiency?

Mario Carneiro (Oct 06 2020 at 19:00):

Hm, well if you ask me (and you did), I think it would be better to state the search thusly:

def search_upto (c n : ) : Prop :=
n = c * 11   m : , m < n  problem_predicate m  solution_predicate m

lemma search_upto_start : search_upto 9 99 := rfl, λ n h p, by linarith [ge_100 p]⟩

lemma search_upto_step {c n} (H : search_upto c n)
  {c' n'} (ec : c + 1 = c') (en : n + 11 = n')
  {l} (el : nat.digits 10 n = l)
  (H' : c = sum_of_squares l  c = 50  c = 73) :
  search_upto c' n' :=
begin
  subst ec, subst en, subst el,
  obtain rfl, H := H,
  refine by ring, λ m l p, _⟩,
  obtain h₁, m, rfl⟩, h₂ := id p,
  by_cases h : 11 * m < c * 11, { exact H _ h p },
  have : m = c, {linarith}, subst m,
  rw [nat.mul_div_cancel_left _ (by norm_num : 11 > 0), mul_comm] at h₂,
  refine (H' h₂).imp _ _; {rintro rfl, norm_num}
end

lemma search_upto_end {c} (H : search_upto c 1001)
  {n : } (ppn : problem_predicate n) : solution_predicate n :=
H.2 _ (by linarith [lt_1000 ppn]) ppn

/-
Now we just need to combine the results from the `search` lemmas.
-/

lemma right_direction {n : } : problem_predicate n  solution_predicate n :=
begin
  have := search_upto_start,
  iterate 82 {
    replace := search_upto_step this (by norm_num1; refl) (by norm_num1; refl)
      (by norm_digits; refl) dec_trivial },
  exact search_upto_end this
end

I can add this to the PR but it depends on the norm_digits tactic in the other PR

Mario Carneiro (Oct 06 2020 at 19:00):

I dream of the day when we can prove the CFSG like this

Scott Morrison (Oct 06 2020 at 21:55):

Actually, contrary to what has been said above, tidy doesn't try linarith at all.

Scott Morrison (Oct 06 2020 at 21:56):

tidy mostly uses split, ext, intros, dsimp, simp, assumption, solve_by_elim.

Scott Morrison (Oct 06 2020 at 21:56):

That is, it is mostly a "take things apart, simplify, try to put things back together" tactic.

Scott Morrison (Oct 06 2020 at 21:57):

(oh, and auto_cases, which tries to uses cases without doing case splits)

Scott Morrison (Oct 06 2020 at 21:57):

This actual list is at https://github.com/leanprover-community/mathlib/blob/master/src/tactic/tidy.lean#L41

Scott Morrison (Oct 06 2020 at 21:58):

A different tactic which is more of a "kitchen sink" is hint, which just tries a bunch of tactics and reports which ones make progress.

Scott Morrison (Oct 06 2020 at 21:59):

Unfortunately the best way to see which tactics hint will try out is to grep for add_hint_tactic in the sources, as it is hooked up nonlocally.

Scott Morrison (Oct 06 2020 at 21:59):

Fortunately, on the other hand, it's therefore easy to add new tactics to its repertoire.

Jason Rute (Oct 06 2020 at 22:41):

Kevin Lacker said:

so if an AI strategy ends up being something like a tree search, perhaps 80% of the work trawling through this tree is actually handling the "trivial" parts

I think a lot of the new AI systems for ITP (TacticToe, DeepHOL, ProverBot9001, Tactician, GPT-f, etc.) are getting better at filling in these routine steps, but I agree that better evaluation and/or datasets would help. For example, GPT-f was improved by adding in some routine calculations into the train set. Also we need to understand what types of problems these systems excel at and where the gaps are.

I'm not necessarily saying that GPT-f style systems are the way to go with the IMO grand challenge, and I don't want to derail the work on careful formalization and dissection of IMO problems to say just throw a 100-billion parameter black box at anything. But at the same time, systems like Tactician have should a lot of promise without a lot of computational cost, so they could become one tool of many in the tool belt.)

Kevin Lacker (Oct 06 2020 at 22:52):

yeah, I thought GPT-f in particular was just a little too simple to be useful yet - it seems like the problems it is capable of solving are still a smaller set than some of the more complicated tactics, like ring

Kevin Lacker (Oct 06 2020 at 22:53):

i feel like the architecture is not ideal - the GPT structure isn't really doing a tree search type thing. but i'm not sure

Kevin Lacker (Oct 06 2020 at 22:54):

the largest model is still not solving most of the 9-digit division problems

Kevin Lacker (Oct 06 2020 at 22:56):

so, is it going to be able to be extended to be something that's like, one step beyond norm_num or nlinarith, that could reduce n * n = 3 to false? it seems like more parameters won't get there with the GPT-f architecture

Jason Rute (Oct 06 2020 at 23:36):

Kevin Lacker said:

so, is it going to be able to be extended to be something that's like, one step beyond norm_num or nlinarith, that could reduce n * n = 3 to false? it seems like more parameters won't get there with the GPT-f architecture

GPT-f (along with the other systems I've mentioned) definitely does tree search.

Jason Rute (Oct 06 2020 at 23:41):

Also, GPT-f is unique in that it is for MetaMath which doesn't have tactics, but that has nothing to do with the design of GPT-f. For example, DeepHOL or Tactician both guide tactics, including tactics like ring and nlinarith when it makes sense. The problem is that if you add a new tactic, they need to be retrained. They don't currently have the full power of hierarchical reinforcement learning or program synthesis where one can build up high-level actions from low-level actions.

Joseph Myers (Oct 07 2020 at 01:35):

Kevin Lacker said:

what fraction of formalizing an IMO problem is formalizing part that a human would consider trivial? looking at the formalizations so far it's maybe... 80%?

About 95% of formalizing one (easier-than-IMO) geometry problem was teaching mathlib enough geometry for that one problem, but quite likely 80% of what went in mathlib there was also human-trivial, so that doesn't contradict your point. My formalization of the combinatorics problem from the same paper ended up taking 3000 lines of Lean, almost all concerned with trivialities.

Mario Carneiro (Oct 07 2020 at 01:40):

Yeah, I think there are two distinct categories of trivial there that should be teased apart. There is (1) library building, and then (2) rote application of theorems to the problem context (and then (3) actual interesting mathematics). It is fairly common for (1) to make up the majority of any given formalization project, with numbers ranging from 60% to 90% for a new area. For a well established area it might get as low as 10% but I don't think I've ever seen a formalization project actually be completely served by whatever library existed at the outset

Mario Carneiro (Oct 07 2020 at 01:40):

everyone builds a for_mathlib file

Mario Carneiro (Oct 07 2020 at 01:42):

But once you have actually finished the proof, you can refactor it to improve the backing library and take that down to 0%, and then you can see the proof as it would be if the library was already prepared for it. This is when the question of how much triviality becomes more interesting

Joseph Myers (Oct 07 2020 at 01:48):

I've yet to formalize an olympiad problem without finding something to add to mathlib proper, and elementary algebra and number theory are areas that are pretty well-covered in mathlib. (Having formalized an olympiad paper with one problem in each area, I might well now prefer problems that show up gaps in mathlib over those that more routinely use mainly existing lemmas. But both kinds are useful when hoping to learn where we could do with better automation, or to provide training material for AIs.)


Last updated: Dec 20 2023 at 11:08 UTC