Zulip Chat Archive

Stream: Is there code for X?

Topic: Working with big-O notation like an analyst


Terence Tao (Oct 22 2023 at 02:52):

Mathlib.Analysis.Asymptotics.Asymptotics supports a concept f =O[l] g of big-O notation, where for sake of discussion we can take f g : β„• β†’ ℝ and l := Filter.atTop β„•. But I don't see how to easily use this notation to implement the type of calculations one often does in analysis, a typical one of which is

(n+1)^{e^{1/n}} = (n+1)^{1 + O(1/n)}
= exp((log 𝑛 + 𝑂(1/𝑛))(1+𝑂(1/𝑛))
= exp( log n + O(log n/n) )
= n (1 + O(log n/n))
= n + O(log n).

Ideally this should be set up as a calc, with some basic rules to manipulate expressions involving asymptotic notation (e.g., x = O(1) β†’ exp x = 1 + O(x)) that one can insert into various Lean tactics such as rw). But it is difficult to formalize this calculation either in the reals ℝ (where the O() notation makes no sense), or in the type of functions ℕ→ℝ (where O() does make sense, but many rules of algebra, e.g., x^y = exp (y * log x), are not currently implemented, and also there isn't the framework to deal with unspecified functions like O()). One can eventually prove the above result in Lean by repeatedly using ext and rcases and use at each point where the asymptotic notation appears, but it is quite tedious.

For my immediate applications I think I can get by by simply defining one new explicit constant for each appearance of the O() notation, with each constant depending on previous ones in some (possibly non-constructive) fashion. But I wonder whether there is a way to use the full power of dependent type theory to actually implement asymptotic notation as it is used by analysts. It seems to me that one should view each expression such as exp((log 𝑛 + 𝑂(1/𝑛))(1+𝑂(1/𝑛)) appearing in the above as a type, built from a base type n which is not a natural number precisely, but rather a sort of "formal natural number" suitable for asymptotic analysis (in particular, formal natural numbers are allowed to be not fully deterministic, in order to accommodate the incompletely specified terms coming from O() notation). One could imagine having Lean formalize some general transfer principle to the effect that any fact such as x^y = exp( y * log x ) that is true for reals (ignoring for now the side hypothesis x>0 for simplicity) would carry over to formal real types, and then there would be some sort of specialization theorem that asserts that any result such as \( f n = O( g n ) \) that was established for the base type \(n\) would imply the asymptotic bound \( f =O[l] g \) as defined in Mathlib.Analysis.Asymptotics.Asymptotics. (I am not extremely familiar with topos theory, but perhaps what I am trying to describe here is some sort of "asymptotic topos".) Is this remotely feasible to implement within Lean?

Heather Macbeth (Oct 22 2023 at 03:35):

Terence Tao said:

It seems to me that one should view each expression such as exp((log 𝑛 + 𝑂(1/𝑛))(1+𝑂(1/𝑛)) appearing in the above as a type, built from a base type n which is not a natural number precisely, but rather a sort of "formal natural number" suitable for asymptotic analysis

One way to try to write these "formal natural numbers" would be with a calculus like the following:

import Mathlib

inductive FormalElementaryFunction
  | n : FormalElementaryFunction
  | const : β„• β†’ FormalElementaryFunction
  | add : FormalElementaryFunction β†’ FormalElementaryFunction β†’ FormalElementaryFunction
  | mul : FormalElementaryFunction β†’ FormalElementaryFunction β†’ FormalElementaryFunction
  | inv : FormalElementaryFunction β†’ FormalElementaryFunction
  | exp : FormalElementaryFunction β†’ FormalElementaryFunction
  | log : FormalElementaryFunction β†’ FormalElementaryFunction

namespace FormalElementaryFunction

instance (n : β„•) : OfNat FormalElementaryFunction n where ofNat := const n
instance : Add FormalElementaryFunction := ⟨add⟩
instance : Mul FormalElementaryFunction := ⟨mul⟩
instance : Inv FormalElementaryFunction := ⟨inv⟩
instance : Pow FormalElementaryFunction FormalElementaryFunction where pow f g := exp (log f * g)
instance : Div FormalElementaryFunction where div f g := f * g⁻¹

example : FormalElementaryFunction := (n + 1) ^ (exp (1 / n))
example : FormalElementaryFunction := exp (log n + (log n / n))

I haven't attempted to write this part
Terence Tao said:

in particular, formal natural numbers are allowed to be not fully deterministic, in order to accommodate the incompletely specified terms coming from O() notation

or the transfer principle you describe ...

Trebor Huang (Oct 22 2023 at 03:52):

Is there any attempt in any proof assistant at all that allows you to use the internal language of some topos, and later exit it and obtain a theorem in the external world automatically?

Trebor Huang (Oct 22 2023 at 03:55):

On the other hand, maybe O(f) can be seen as a macro that lifts an existence quantifier to the toplevel: It becomes "There exists a function F such that ..." and replaces the O(f) with F. Macros can perform arbitrary transformations on the syntax so it is possible, but it looks difficult.

Mario Carneiro (Oct 22 2023 at 04:00):

O(x) as a macro sounds like a great idea

Mario Carneiro (Oct 22 2023 at 04:03):

I think the real challenge with the "calc block" that @Terence Tao shows is that the equality isn't really an equality, it's an inclusion of sets with the most formalistic interpretation of O(_), and sometimes not even that (especially when multiple O(_) expressions start appearing in the term). It seems exceedingly difficult to make it more than a proof sketch with all the ambiguity in the notation

Trebor Huang (Oct 22 2023 at 04:13):

I'm sure some logician has already done that right? There are some quirks like 1 + O(x) = O(x), which means "For all F asymt. to x, there exists G asymt. to x, such that 1+F=G", note the quantifier difference

Mario Carneiro (Oct 22 2023 at 04:14):

right, that's not equality, that's subset

Terence Tao (Oct 22 2023 at 04:27):

Trebor Huang said:

Is there any attempt in any proof assistant at all that allows you to use the internal language of some topos, and later exit it and obtain a theorem in the external world automatically?

That could potentially be very useful for advanced mathematics, though I really don't know if this is feasibly implementable in current proof assistants. Just in analysis alone, I can think of several other topoi besides the "asymptotic topos" that we informally jump in and out of (even though most analysts are not trained in topos-theoretic language): the nonstandard topos (where every standard type X gets a nonstandard supertype (* X) with useful transfer principles), the probabilistic topos (where every deterministic type X gets a probabilistic supertype of X-valued random variables, again with useful transfer principles), the continuous topos (each topological type X gets a supertype of X-valued quantities varying continuously with respect to ambient parameters), the polynomial time topos, etc.. We can (and do) simulate all these topoi with set-theoretic functions (e.g., a random real number is modeled as a function from a probability space to the reals), but it isn't completely ideal (e.g., there are "base change" operations that are annoying to implement in a function-based framework, and in the absence of a good transfer principle one will be using function extensionality ext over and over and over again...).

Mario Carneiro (Oct 22 2023 at 04:34):

There is enough structure there that you could have a calc block-like environment such that

calc_asymp n, l in
  (n+1)^(e^(1/n))
  = (n+1)^(1 + O(1/n)) := _
  = exp ((log n + O(1/n)) * (1+O(1/n))) := _
  = exp (log n + O(log n/n)) := _
  = n * (1 + O(log n/n)) := _
  = n + O(log n) := _

is sugar for something like:

have H1 :
  βˆƒ g, g =O[l] (fun n => 1/n) ∧
  βˆ€αΆ  n in l, (n+1)^(e^(1/n)) ≀ (n+1)^(1 + g n) := _
have H2 :
  βˆ€ f, f =O[l] (fun n => 1/n) β†’
  βˆƒ g1, g1 =O[l] (fun n => 1/n) ∧
  βˆƒ g2, g2 =O[l] (fun n => 1/n) ∧
  βˆ€αΆ  n in l, (n+1)^(1 + f n) ≀ exp ((log n + g1 n) * (1+g2 n)) := _
have H3 :
  βˆ€ f1, f1 =O[l] (fun n => 1/n) β†’
  βˆ€ f2, f2 =O[l] (fun n => 1/n) β†’
  βˆƒ g, g =O[l] (fun n => log n/n) ∧
  βˆ€αΆ  n in l, exp ((log n + f1 n) * (1+f2 n)) ≀ exp (log n + g n) := _
have H4 :
  βˆ€ f, f =O[l] (fun n => log n/n) β†’
  βˆƒ g, g =O[l] (fun n => log n/n) ∧
  βˆ€αΆ  n in l, exp (log n + f n) ≀ n * (1 + g n) := _
have H5 :
  βˆ€ f, f =O[l] (fun n => log n/n) β†’
  βˆƒ g, g =O[l] (fun n => log n) ∧
  βˆ€αΆ  n in l, n * (1 + f n) ≀ n + g n := _
show
  βˆƒ g, g =O[l] (fun n => log n) ∧
  βˆ€αΆ  n in l, (n+1)^(e^(1/n)) ≀ n + g n
from combine H1 H2 H3 H4 H5

where combine is a proof done by the tactic

Mario Carneiro (Oct 22 2023 at 04:38):

The main challenge with things like this is to come up with a precise spec for what one wants the tactic to do and how it should accept input. Scope creep is a real thing here, you have to be careful not to just put all your hopes and dreams into a miracle tactic of the future because then no one can implement it, but with a clear spec a tactic that does what I have illustrated above could probably be done in a day

Terence Tao (Oct 22 2023 at 04:42):

Trebor Huang said:

I'm sure some logician has already done that right? There are some quirks like 1 + O(x) = O(x), which means "For all F asymt. to x, there exists G asymt. to x, such that 1+F=G", note the quantifier difference

Generally speaking one can indeed make asymptotic notation rigorous by

  • treating all O() symbols as referring to subsets (or subtypes) of all functions of the ambient parameter(s)
  • interpreting any relation R means "for any expression X of the form of the LHS, there exists a Y of the form of the RHS such that X R Y". In particular, equality now becomes the subset relation.
  • extending all operations * to "all expressions of the form X * Y, where X is of the form of the LHS and Y is of the form of the RHS".
    For instance A+B is now basically the Minkowski sum of A and B.

  • One can even now define O(A) when A is itself a type that involves asymptotic notation, to denote the subtype of all expressions that are of the form O(X) for some X of the form A. For instance O(O(n)+O(n^2)) = O(n^2).

Presumably the equality sign is reserved in Lean, so I would be fine with replacing = with a slightly different symbol when dealing with asymptotic types if this is needed to be consistent with Lean syntax.

So in principle one could asymptotic notation involving say one natural number parameter β„• by a type of subtypes of functions from β„• to ℝ, and then transferring operations and relations over to this type basically as suggested by Heather.

There are advanced versions of O() notation where there are multiple ambient parameters, e.g., n, d, k ..., and one could have quantites such as O_{d,k}(1) that are bounded by something that can depend on d,k but not on n, but I believe that any solution that could implement single-parameter O() notation in a usable fashion could be extended rigorously to the multi-parameter setting. (o() notation comes with some additional minor subtleties but again I believe the main difficulty is first getting O() to work properly.)

Trebor Huang (Oct 22 2023 at 04:43):

We can do the multi-parameter version with filter shenanigans if I understand correctly

Terence Tao (Oct 22 2023 at 04:49):

Mario Carneiro said:

There is enough structure there that you could have a calc block-like environment such that

I like this proposal! (Incidentally the ≀ signs in your sugar should be = signs because that's what's in the calc_asymp block; but presumably the calc_asymp block would also be able to handle mixtures of =, ≀, <, and also the asymptotic symbol << (which is a synonym for =O[l].) Each individual step in this block would then need a bunch of rcases and ext steps to establish, but one could conceivably develop additional tools later (formalising basic things like O(f)+O(g)=O(f) O(f) + O(g) = O(f) if g=O(f)g = O(f)) to make this less painful.

Terence Tao (Oct 22 2023 at 04:52):

Certainly this looks much more feasible to implement than the "full topos support" idea.

Mario Carneiro (Oct 22 2023 at 05:00):

I think that you probably don't actually want those have goals to be your real subgoals, there are some basic cleanups you can do like intro f hf, although the syntax does not give these variables names and maybe that's annoying when you are in the proof

Terence Tao (Oct 22 2023 at 05:03):

Yeah, that would be nice. Is it possible to perform string operations on variable names? So one can just supply the prefixes with f g in the tactic and it will generate f1, f2, f3, ... for the input functions and g1, g2, g3, ... for the output functions as necessary

Mario Carneiro (Oct 22 2023 at 05:03):

alternatively, you can try to pack these expressions up into a definition, i.e. some variant on IsLittleO that better matches the form of the subgoals, but I am not sure it will be easy to do "rewriting" proofs, at least not without hiding the quantifiers first

Mario Carneiro (Oct 22 2023 at 05:04):

It's possible to do any kind of variable naming one prefers here, but it is considered "good style" to not invent names that the user didn't write (or if so, not to let the user reference them directly)

Terence Tao (Oct 22 2023 at 05:04):

Even just implementing calc_asymp along the lines you suggest would eliminate about half of the quantifier management because your combine tactic is doing some, though not all, of the heavy lifting. It would certainly be better than nothing, and also lets one impose a conceptual high level strategy on the proof.

Mario Carneiro (Oct 22 2023 at 05:05):

Besides the basic "code hygiene" reasons, a more immediate advantage of having the user write down variable names is that you can put type hovers and go-to-declaration sites on those variable identifiers

Terence Tao (Oct 22 2023 at 05:07):

Maybe the user provided labels with f hf g hg could be functions from Fin a and Fin b to the various functions and hypotheses, where a is the number of O() type objects on the LHS and b is the number of O() type objects on the RHS? So one has to use f 0 to refer to the first function, etc..

Mario Carneiro (Oct 22 2023 at 05:08):

would it be too weird to try to stick a variable name on the O(_) expression? like (f := O(_)) or something

Terence Tao (Oct 22 2023 at 05:08):

Though f 0, f 1, etc., could all have slightly different types, I don't know if that would be a problem with the dependent type syntax

Mario Carneiro (Oct 22 2023 at 05:09):

it would probably not result in clear goals

Mario Carneiro (Oct 22 2023 at 05:09):

besides, numbers are not generally the easiest way for humans to understand variable names

Terence Tao (Oct 22 2023 at 05:10):

these are sort of "nonce" functions, like this - intended for very temporary use only

Mario Carneiro (Oct 22 2023 at 05:11):

If the := _ proofs are done by a rewriting style, it may be possible that you actually never even introduce them

Mario Carneiro (Oct 22 2023 at 05:11):

like you apply a lemma saying 1 + O(n) = O(n) and all the quantifier handling is inside the lemma

Mario Carneiro (Oct 22 2023 at 05:12):

in which case maybe intro f hf is not a good preprocessing step

Terence Tao (Oct 22 2023 at 05:12):

That would be desirable, though presumably the additional tools one would develop to rewrite O() type expressions would unpack the O() notation within the proof of the tool, but in a way that is invisible to the end user

Mario Carneiro (Oct 22 2023 at 05:12):

Right, the question here is about what goals the user sees

Mario Carneiro (Oct 22 2023 at 05:13):

we can make combine do pretty much anything as long as there are lemmas justifying it (what I have here is really just some basic transitivity and exists/forall matching so not hard)

Mario Carneiro (Oct 22 2023 at 05:15):

It's also possible they don't actually see a subgoal exactly, but rather there is another DSL for describing the proofs of these subgoals with customized versions of rw et al. That would be more complex though

Terence Tao (Oct 22 2023 at 05:15):

So one could imagine a two-stage deployment where first one implements a bare-bones calc_asymp tactic in which each step still requires quantifier management (but perhaps presented in a form in which the quantifiers are hidden from view even in the tactic goal state by some well chosen definitions), and later introduce rewriting tactics that can manipulate these steps in a way that hides the quantifiers from view completely.

Terence Tao (Oct 22 2023 at 05:22):

One thing I like about your approach is that we don't have to manually re-interpret every operation (e.g., exp) and every type (natural numbers, real numbers, complex numbers, etc.) into the asymptotic framework - it looks like the tool should be able to interpret everything automatically (of course assuming the types have whatever structures are needed to define asymptotic notation).

Terence Tao (Oct 22 2023 at 05:25):

Anyway, I would be interested in discussing this further (in particular in trying to work out a precise spec), though perhaps we should first see if anybody else would be interested in having such a tool besides myself.

Mario Carneiro (Oct 22 2023 at 05:34):

Even if it's just for you I think we could probably find someone to work on it. We're really happy to have you here! But it's true that if other people also want this it might help round out the spec a bit and make sure it's not over-indexed on one particular use case (already above there was some stuff about probabilistic topoi and such, it's unclear whether this would or should cover those examples)

Mario Carneiro (Oct 22 2023 at 05:36):

(To that end, it would be helpful to get some :+1: here from others on whether they would use this tactic and/or would want it to do other things.)

Terence Tao (Oct 22 2023 at 05:43):

I think people who reason with random variables (for instance) don't really need custom calc blocks. I could imagine some feature requests there regarding "base change" operations where at some intermediate stage of the argument one flips some more coins and thus enlarges the ambient probability space, and has to drag all the previously established results about existing deterministic and random variables to this larger space, but that's a feature request for a different time. :grinning:

Terence Tao (Oct 22 2023 at 05:46):

There are a small number of other places in mathematics where non-deterministic notation like O() is used. For instance the quadratic formula x=βˆ’bΒ±b2βˆ’4ac2ax = \frac{-b \pm \sqrt{b^2-4ac}}{2a} has the indeterminate sign Β±\pm. But again, it's rare that such indeterminate expressions are manipulated in large calc type blocks, they are usually isolated and so can be easily managed with standard quantifiers.

Terence Tao (Oct 22 2023 at 05:53):

When manipulating formal power series one does sometimes see calc type blocks involving things like

\exp(f(x)) = \exp( 1 + c_1 x + c_2 x^2 + <higher order terms> )
= 1 + c_1 x + (c_2 + c_1^2/2) x^2 + <higher order terms>

in which one doesn't want to keep explicit track of all the higher order terms, but this can probably be dealt with either by some minor variant of the O() formalism, or else by quotienting out by a suitable ideal of higher order terms. One could imagine having a narrow tool to begin with and generalizing/abstracting it later if other use cases show up.

Mario Carneiro (Oct 22 2023 at 05:55):

I always interpreted <higher order terms> there as shorthand for O(x^3) (or possibly o(x^2) for the really weird stuff)

Mario Carneiro (Oct 22 2023 at 05:56):

I would not usually use a notation like that unless I was actually doing a proof sketch (e.g. in a lecture), it seems like bad form for an actual proof

Terence Tao (Oct 22 2023 at 05:57):

Yeah usually that's the case, though formal power series don't come with a norm and so strictly speaking it isn't O() notation as defined for instance in Mathlib.

Terence Tao (Oct 22 2023 at 05:58):

I wouldn't consider it a use case that necessitates a major departure from the current Lean suite of tactics and tools. (Many such manipulations in formal power series can be dealt with by existing algebraic tools for graded algebras, in particular quotient homomorphisms. Analysis by contrast can't easily do things like "quotient out by all negligible errors" [unless one works in nonstandard analysis, or takes an asymptotic limit] and this is one reason why we rely so much on asymptotic notation instead.)

Terence Tao (Oct 22 2023 at 15:09):

Actually there is another place where indeterminate constants show up in math - ODE, starting with the infamous +C in the fundamental theorem of calculus, thus for instance deriv deriv f = fun x \to x \imp deriv f = fun x \to x^2/2 + C \imp f = fun x \to x^3/6 + C x + D. But again, these constants aren't so numerous and interact with each other so much that a dedicated calc block is really needed; one can simply introduce each constant in turn with an appropriate existential quantifier, opened by an rcases as necessary.

Bhavik Mehta (Oct 22 2023 at 15:14):

This is something I'd also find very valuable, it's showed up for me in the unit-fractions project, in which many statements involved big-O (especially the analytic number theory estimates), in my work on Roth's theorem and its more recent stuff in the Kelley-Meka bound, and again in my current project. My original approach for dealing with these was to make the big O constant explicit and unfold the definition of little o in each statement, but for more complicated expressions this is unsurprisingly not a great idea, so what I do now is say things like "there exists a function g which is O(1/n), and for all n, f(n) = (1 + g(n)) h(n)", where f and h would be explicit. Having a more convenient notation for writing this (and working with it) would be very useful

Bhavik Mehta (Oct 22 2023 at 15:16):

Another thing which would be useful, and importantly distinct, would be automation to show certain things are big/little-O of each other. I found myself proving stuff like n^(1/2) log(n) is o(n) quite a few times, and @Thomas Bloom cited this kind of proof as the most annoying part of our formalisation of his paper as well.

FrΓ©dΓ©ric Dupuis (Oct 22 2023 at 15:18):

Bhavik Mehta said:

Another thing which would be useful, and importantly distinct, would be automation to show certain things are big/little-O of each other. I found myself proving stuff like n^(1/2) log(n) is o(n) quite a few times, and Thomas Bloom cited this kind of proof as the most annoying part of our formalisation of his paper as well.

I believe Manuel Eberl wrote such tactics for Isabelle as part of his PhD thesis, it would be worth a look to see if we could import this into Lean.

Bhavik Mehta (Oct 22 2023 at 15:18):

FrΓ©dΓ©ric Dupuis said:

I believe Manuel Eberl wrote such tactics for Isabelle as part of his PhD thesis, it would be worth a look to see if we could import this into Lean.

Definitely, I had a brief look at this a while ago and I believe they weren't quite strong enough to do the cases we needed, but it'd still be better than nothing! A particular example is in Appendix B2 here: https://arxiv.org/abs/2112.03726

Terence Tao (Oct 22 2023 at 15:29):

Bhavik Mehta said:

Another thing which would be useful, and importantly distinct, would be automation to show certain things are big/little-O of each other. I found myself proving stuff like n^(1/2) log(n) is o(n) quite a few times, and Thomas Bloom cited this kind of proof as the most annoying part of our formalisation of his paper as well.

I can envision eventually having four types of asymptotic notation support that would complement each other very well:

  • A calc_asymp tactic as discussed above
  • A rw_asymp tactic that can take asymptotic assertions such as O(n ^ 2) + O(n) = O(n^2) (which is some syntactic sugar for a lengthier thing involving quantification on various functions f(n), g(n), h(n)) and use them to rewrite expressions such as X = Y + O(n ^ 2) + O(n). [A key difference with rw: rw_asymp should only be able to affect the right-hand side of an asymptotic expression, not the left-hand side, and one is not able to apply a reverse rewrite rw_asymp [<- h]. This is due to the non-symmetric nature of asymptotic equality.]
  • A library of basic theorems providing such assertions to feed into rw_asymp, like we have right now for the elementary arithmetic operations
  • A asymp_simp tactic that can use these theorems iteratively to simplify a complex asymptotic assertion.

But the first task seems to be the simplest and most feasible to write, and would already help out somewhat, so maybe one should start there.

Terence Tao (Oct 22 2023 at 16:13):

Perhaps in order to support all of these tasks we need some new data structures. I'm thinking of a data structure, call it say AsympProp, an instance A : AsympProp of which contains the following data:

  • A filter A.filter
  • A string for the ambient parameter A.parameter_name
  • A string A.proposition_string for the proposition it is trying to encode

and then a method A.prop that generates a Prop that interprets A.proposition_string asymptotically using A.filter and A.parameter_name (it may also need to take as input some portion of the ambient tactic state). For instance if A.filter = l for some filter l, A.parameter_name = "n", and A.proposition_string = "O(n^2)+O(n)=1+O(n^3)", then A.prop would be something like

βˆ€ f1, f =O[l] (fun n => n^2) β†’
  βˆ€ f2, f =O[l] (fun n => n) β†’
    βˆƒ g1 =O[l] (fun n => n^3) ∧
       βˆ€αΆ  n in l, f1(n) + f2(n) = 1 + g1(n)

and then all the tools mentioned above would reference this data structure. [I guess if A.proposition_string could not be interpreted unambiguously, then A.prop should default to False.]

One problem I see with this approach though is that often the interpretation of a proposition string would depend on the current tactic state, in particular what types are being assigned to various names that appear in the string. So one may have to do this at a more "meta" level (I saw macros mentioned previously, perhaps those are expressive enough to implement this sort of functionality?). Or maybe the data type doesn't statically remember the filter, parameter name, and proposition string, but only the final proposition, with some tool to then convert it back into an asymptotic form as needed for asymptotic rewrites etc.

YaΓ«l Dillies (Oct 22 2023 at 16:18):

Note that rw_asymp is a special case of "generalised rewrite" which allows one to rewrite not only by equality but also by any relation that is preserved by the current context. The typical example being that when proving a * b * c * d * e ≀ a * b' * c * d * e from b ≀ b' (and the relevant positivity assumptions), one does not reason "b ≀ b', so a * b ≀ a * b', so a * b * c ≀ a * b' * c, so a * b * c * d ≀ a * b' * c * d, so a * b * c * d * e ≀ a * b' * c * d * e" but rather "I can substitute b with b' because the surrounding expression is monotone in the occurrence of b I want to replace".

Terence Tao (Oct 22 2023 at 16:25):

YaΓ«l Dillies said:

Note that rw_asymp is a special case of "generalised rewrite" which allows one to rewrite not only by equality but also by any relation that is preserved by the current context. The typical example being that when proving a * b * c * d * e ≀ a * b' * c * d * e from b ≀ b' (and the relevant positivity assumptions), one does not reason "b ≀ b', so a * b ≀ a * b', so a * b * c ≀ a * b' * c, so a * b * c * d ≀ a * b' * c * d, so a * b * c * d * e ≀ a * b' * c * d * e" but rather "I can substitute b with b' because the surrounding expression is monotone in the occurrence of b I want to replace".

Morally I agree, but there is the wrinkle that an asymptotic expression such as O(n^2)+O(n) = 1 + O(n^3) cannot be interpreted directly as a type built up recursively from simpler types like O(n^2) and O(n^3), but has to be converted via some meta syntactic sugar to the lengthier expression I described above involving several quantifiers before a rewrite can be applied. I guess the question is whether a rewrite tactic can work with this sugar.

Terence Tao (Oct 22 2023 at 16:32):

... actually, one could interpret big-O notation as describing a type directly, namely a subtype of functions from the parameter space to some normed space, in which case expressions such as O(n^2)+O(n) = 1 + O(n^3) can be interpreted as a recursively defined type (bearing in mind that = is now something closer to the subset/subtype relation). So then maybe all one needs is a tool that shows that proposition is logically equivalent to a proposition that is the big mess of quantifiers as described above, and that way one could have the best of both worlds, interpreting O() notation as a first-class object while also being able to manipulate it with existing tools.

Bhavik Mehta (Oct 22 2023 at 16:34):

I remember discussing this with Thomas and @Alex Kontorovich a while ago as well, I think we came to the same conclusion as this

Terence Tao (Oct 22 2023 at 16:38):

One nice thing about this approach is that it isn't specific to O() notation, it covers all types of indeterminate expressions that depend on parameters in a partially specified fashion. O() and o() are still the biggest use cases, but the same framework could conceivably be used for other types of mathematics.

The one "meta" thing one would still need to do with this approach is find some way to automatically extend operators and relations from the deterministic setting to the indeterminate setting. For instance, the power operation ^ is defined on the reals, so it would be nice to automatically have an operation on this indeterminate real type that makes expressions such as O(n)^O(n) well-defined without having to make an explicit definition for every single conceivable operation one would want to use.

Terence Tao (Oct 22 2023 at 16:44):

With this approach one could basically use the existing calc tactic (possibly one has to replace the equality symbol by something else if equality is reserved for symmetric operations), and not need a separate calc_asymp variant. Instead one would have another tactic, say asymp_quantify, which would replace a hypothesis or goal that involves O() types with their counterparts involving quantifiers, and one has the option at each calc step either of using this tactic to expand things out into quantifiers and work by hand, or else use some future tools to manipulate the asymptotic quantities directly. (The type for asymptotic notation may need to remember a little bit more than just the subtype of functions it represents in order to make asymp_quantify well defined, though, i.e., it might not be 100% extensional.)

Anatole Dedecker (Oct 22 2023 at 16:48):

Note that, if O f designated the subset of functions that are O of f (there would have to be a filter in the notation too of course), then you could already litteraly write O (id ^ 2) + O id βŠ† 1 + O (id ^ 3). Of course for more complicated expressions you'd need to write images, but I think the most reasonable approach is simply to add syntactic sugar and maybe some tactics for manipulating these inclusions.

Terence Tao (Oct 22 2023 at 16:54):

That ... actually looks like a workable solution to me

Terence Tao (Oct 22 2023 at 16:57):

There is some casting issue in which one should identify a deterministic function f with its singleton subset {f} in order to interpret expressions such as n^2 + n = 1 + O(n^3) on equal footing with O(n^2) + O(n) = 1 + O(n^3) or n^2 + n = n + n^2 but presumably the sugar can handle all of this.

Anatole Dedecker (Oct 22 2023 at 17:02):

The solution for emulating "ambient parameters" (I wouldn't like it personally, but that's mostly because I avoid computations in the first place) is really simple : you could just start a proof by let Ξ© := β„• Γ— ℝ; let n : Ξ© β†’ β„• := Prod.fst, let x : Ξ© β†’ ℝ := Prod.snd and basically you can treat them as your ambiant variables. Of course that could be a one-line declare_vars or whatever. The main issue is that you'd have to write a lot of compositions, but this is again mostly a matter of notations.

All of this is not to minimize the task (we still have to figure out clever notations), but I believe that we probably don't need some kind of extra-powered tactic mode keeping in mind a lot of things for us, because what we want to write isn't actually that far from what's happening mathematically.

Terence Tao (Oct 22 2023 at 17:51):

That's a workable solution in many contexts, particularly when the collection of ambient parameters can be set in advance. The one task that remains a bit tricky with that formalism is "base change", when you want to change the ambient parameters. For instance, if you know that a family F of functions on ℝ are equicontinuous, then you can choose your parameter space Ξ© to be F Γ— ℝ Γ— ℝ with coordinate functions f, x, x0 and write equicontinuity in asymptotic form as f x = f x0 + o(1) where the filter is the one where x converges to x0 while allowing f to vary freely. (Other notions of continuity, such as uniform continuity or uniform equicontinuity, would correspond to other filters.) But then suppose you want to combine this equicontinuity with some continuity property of some other function that uses a different parameter space Ξ©'. Then often what you want to do is pass to a larger parameter space such as Ξ© Γ— Ξ©' and lift up all the previous asymptotic results to this larger space. This can in principle be done by hand, but potentially one could benefit from some well designed tactics to automate this.

Anatole Dedecker (Oct 22 2023 at 18:20):

Mathematically this should follow rather easily from our extensive API to manipulate filters (pullbacks, pushforwards, lattice structure), and lemmas such as docs#Asymptotics.IsLittleO.mono. But I don’t have enough examples of that in mind to be able to imagine how easy it would be to do by hand, or what would be needed to make it easier.

Patrick Massot (Oct 24 2023 at 03:29):

@Terence Tao I agree that being able to write O(⋯ )O(\cdots) everywhere like we do on paper would be nice. However I think that we can already go a long way with the ternary relation f=g+O(h)f = g + O(h) and a good set of lemmas (we would say "a good API" in the software engineering jargon). Such lemmas would be needed anyway to power the fancier version you dream of. Mathlib currently only has the binary relation f=O(h)f = O(h), but building the ternary one is straightforward. We also lack many elementary comparison lemmas because Mathlib is very weak on examples and concrete lemmas. Today I wrote https://gist.github.com/PatrickMassot/9619c317cd99bf9ea30696e48a5d9ac2 which starts with a bunch of elementary estimates that should go to Mathlib soon, then build the basic ternary BigO API and ends with your example with (n+1)e1/n(n+1)^{e^{1/n}}. You can drop this file into your project and have look, starting from Line 307.

Patrick Massot (Oct 24 2023 at 03:33):

Let me insist that the purpose of the above message is not to tell you what to do on paper, or even what to do in Lean. However notice there is no existential quantifier juggling in the above file, and no sequence of constant that you would need to track. So the worry you expressed in the first message of this thread is avoidable without new fancy tactics. What is still true is that things would be simpler with a moderately fancy tactic proving basic estimates that occur as side-conditions (there are comments about that in my file).

Terence Tao (Oct 24 2023 at 03:44):

This looks like a workable solution for many basic use cases. I might try to incorporate some of the tools here into my project, thanks.

Patrick Massot (Oct 24 2023 at 04:09):

I forgot to say that some proofs in the first part of the file were written or improved by @Anatole Dedecker, @Heather Macbeth and @Yury G. Kudryashov when I shared my frustration of not finding those estimates in Mathlib.

Yury G. Kudryashov (Oct 24 2023 at 04:57):

How hard would it be to implement a tactic that goes from the inside of a formula, applies existing knowledge about terms and functions and gets an estimate on the whole formula?

Yury G. Kudryashov (Oct 24 2023 at 04:57):

Something like what positivity does for 0 < _/_ β‰  0.

Yury G. Kudryashov (Oct 24 2023 at 04:58):

(clearly, we'll need more lemmas)

Yury G. Kudryashov (Oct 24 2023 at 04:58):

@Terence Tao What are you formalizing?

Terence Tao (Oct 24 2023 at 04:59):

Yury G. Kudryashov said:

Terence Tao What are you formalizing?

https://arxiv.org/pdf/2310.05328.pdf , at https://github.com/teorth/symmetric_project

Pedro SΓ‘nchez Terraf (Oct 24 2023 at 12:17):

Terence Tao said:

Yury G. Kudryashov said:

Terence Tao What are you formalizing?

https://arxiv.org/pdf/2310.05328.pdf , at https://github.com/teorth/symmetric_project

The last commit is already a success story! (I just read the log message.)

Patrick Massot (Oct 24 2023 at 14:06):

There is more detail at: https://mathstodon.xyz/@tao/111287749336059662. Short version: there was an edge-case mistake in the paper that was found during formalization. Interestingly it seems the error was really detected when Lean asked to prove something that clearly didn't follow from the current assumptions (as opposed to detecting it while writing a more detailed version on paper before using Lean).

Yury G. Kudryashov (Oct 24 2023 at 22:13):

I'm going to give a talk about Lean at a dynamics conference in about a month. May I use this blog post as an example of "why should we care about formal math"?

Patrick Massot (Oct 24 2023 at 22:17):

Yury, you should be careful when discussing this example. This is really a rather minor mistake which impacts only a rather non-interesting case of the theorem (low value of nn). More importantly, fixing this mistake brings no new insight about why the theorem is true.

Yury G. Kudryashov (Oct 25 2023 at 00:21):

Then I won't discuss it.

Yury G. Kudryashov (Oct 25 2023 at 00:22):

(I know that it's hard for me to be careful about such things; that's why I asked)

Patrick Massot (Oct 25 2023 at 00:29):

I'm still happy that Lean has been useful to Terry. One thing you can do if you want to be safe is to simply quote him, and avoid adding too many comments.

Terence Tao (Oct 25 2023 at 16:03):

Patrick Massot said:

Terence Tao I agree that being able to write O(⋯ )O(\cdots) everywhere like we do on paper would be nice. However I think that we can already go a long way with the ternary relation f=g+O(h)f = g + O(h) and a good set of lemmas (we would say "a good API" in the software engineering jargon). Such lemmas would be needed anyway to power the fancier version you dream of. Mathlib currently only has the binary relation f=O(h)f = O(h), but building the ternary one is straightforward. We also lack many elementary comparison lemmas because Mathlib is very weak on examples and concrete lemmas. Today I wrote https://gist.github.com/PatrickMassot/9619c317cd99bf9ea30696e48a5d9ac2 which starts with a bunch of elementary estimates that should go to Mathlib soon, then build the basic ternary BigO API and ends with your example with (n+1)e1/n(n+1)^{e^{1/n}}. You can drop this file into your project and have look, starting from Line 307.

I'm now hitting the part of my formalization project where I actually would like to have O() notation, and I realize that I would really like support for using big-O notation while breaking into cases. For instance, I might want to show that F =O[l] G, where F, G depend on two parameters n k, and I have one argument that works when n ≀ k, and another when n > k, then I want to combine them. With explicit quantifiers I can do it like this:

import Mathlib

def F (n : β„•) (k : β„•) : β„• := 10 * n * k  -- toy example only
def G (n : β„•) (k : β„•) : β„• := n^2 + k^2 -- toy example only

lemma est1 : βˆƒ C : β„•, βˆ€ n : β„•, βˆ€ k : β„•, (n ≀ k) β†’ F n k ≀ C * G n k := by
  sorry -- argue that 10 * n * k ≀  10 * k^2 ≀ 10 * (n^2 + k^2)

lemma est2 : βˆƒ C : β„•, βˆ€ n : β„•, βˆ€ k : β„•, (n > k) β†’ F n k ≀ C * G n k := by
  sorry -- argue that 10 * n * k < 10 * n^2 ≀ 10 * (n^2 + k^2)

example : βˆƒ C : β„•, βˆ€ n : β„•, βˆ€ k : β„•, F n k ≀ C * (G n k) := by
  rcases est1 with ⟨ C1, h1 ⟩
  rcases est2 with ⟨ C2, h2 ⟩
  use max C1 C2
  intro n k
  rcases le_or_gt n k with h | h
  . apply le_trans (h1 n k h) _
    gcongr
    simp
  . apply le_trans (h2 n k h) _
    gcongr
    simp

What would be a civilized approach to doing it in Lean using O() notation?

Mario Carneiro (Oct 25 2023 at 16:50):

The max reasoning should be expressible using filters: for all sufficiently large C est1 holds, and also est2 holds, so they both hold

Anatole Dedecker (Oct 25 2023 at 16:57):

I'm not sure I understand your goal, do you need your estimates to hold everywhere or just for (n, k) large enough? If the former, then I'm not sure docs#IsBigO is that useful (but it should definitely work, see docs#Asymptotics.isBigO_top ).

Jireh Loreaux (Oct 25 2023 at 16:58):

Terry, are you familiar yet with the filter_upwards tactic? This is an essential tactic for working with filters. (although maybe you don't need it here)

Anatole Dedecker (Oct 25 2023 at 17:00):

I don't think using filters on the constants is the right solution here, the API for IsBigO should allow you not to care about this. Instead I would suggest using docs#Asymptotics.isBigO_sup to "split" your filter in two parts. I'll do an example to show what it would look like.

Anatole Dedecker (Oct 25 2023 at 17:01):

(Side note: maybe a lot of the API could be golfed if we add a characterization of docs#IsBigO with an eventually instead of an exists for the constant. But that should not be relevant here imo)

Anatole Dedecker (Oct 25 2023 at 17:05):

Do we really have no norm on N\mathbb{N} ?!?

Jireh Loreaux (Oct 25 2023 at 17:06):

Why would we? You don't get any of the nice structure (from mathlib) since it isn't an additive group.

Anatole Dedecker (Oct 25 2023 at 17:06):

Well for being able to use asymptotics...

Anatole Dedecker (Oct 25 2023 at 17:07):

Okay but it would be useless, like in NNReal. Forget my comment. But we need to fix that at some point (e.g assuming only BoundedSMul for asymptotics)

Jireh Loreaux (Oct 25 2023 at 17:07):

I think @Yury G. Kudryashov had a plan, or at least ideas, for that at some point recently.

Anatole Dedecker (Oct 25 2023 at 17:26):

Here's a version:

import Mathlib

open Filter Asymptotics Set

noncomputable section

def F (n : β„•) (k : β„•) : β„• := 10 * n * k  -- toy example only
def G (n : β„•) (k : β„•) : β„• := ↑(n^2 + k^2) -- toy example only

lemma est1 : βˆƒ C : β„•, βˆ€ n : β„•, βˆ€ k : β„•, (n ≀ k) β†’ F n k ≀ C * G n k := by
  sorry -- argue that 10 * n * k ≀  10 * k^2 ≀ 10 * (n^2 + k^2)

lemma est2 : βˆƒ C : β„•, βˆ€ n : β„•, βˆ€ k : β„•, (n > k) β†’ F n k ≀ C * G n k := by
  sorry -- argue that 10 * n * k < 10 * n^2 ≀ 10 * (n^2 + k^2)

lemma est1' : ((↑) ∘ β†ΏF : β„• Γ— β„• β†’ β„€) =O[π“Ÿ {x | x.1 ≀ x.2}] ((↑) ∘ β†ΏG : β„• Γ— β„• β†’ β„€) := by
  rcases est1 with ⟨C, hC⟩
  simp [isBigO_principal]
  refine ⟨C, by exact_mod_cast hC⟩

lemma est2' : ((↑) ∘ β†ΏF : β„• Γ— β„• β†’ β„€) =O[π“Ÿ {x | x.1 > x.2}] ((↑) ∘ β†ΏG : β„• Γ— β„• β†’ β„€) := by
  rcases est2 with ⟨C, hC⟩
  simp [isBigO_principal]
  refine ⟨C, by exact_mod_cast hC⟩

example : ((↑) ∘ β†ΏF : β„• Γ— β„• β†’ β„€) =O[⊀] ((↑) ∘ β†ΏG : β„• Γ— β„• β†’ β„€) := by
  convert est1'.sup est2'
  rw [← principal_univ, sup_principal, principal_eq_iff_eq, eq_comm, eq_univ_iff_forall]
  exact fun x ↦ le_or_gt x.1 x.2

I'm not sure it's the cleanest option we have, and I'm sure someone else will think of a way to make it look like what we'd write on paper, but it has the advantage that it would work basically the same in more complicated settings (e.g if there is actually a limiting process).

Anatole Dedecker (Oct 25 2023 at 17:28):

β†ΏF means "F to which we feed both arguments at the same time". You could also write uncurry F. The whole ((↑) ∘ β†ΏF : β„• Γ— β„• β†’ β„€) should just be β†ΏF, but as mentioned above we currently need additive groups to work with asymptotics, so I'm moving everything to β„€.

Patrick Massot (Oct 25 2023 at 17:33):

I you need this a lot in cases where all filters are principals then it may be simpler to duplicate the BigO API. But maybe good notations and simp lemmas would be enough.

Patrick Massot (Oct 25 2023 at 17:40):

In Anatole's example, we can see we are going against the library because principal_univ is a simp lemma going in the wrong direction for our purposes.

Patrick Massot (Oct 25 2023 at 17:40):

The proof could be

convert est1'.sup est2'
  rw [← principal_univ]
  simp [-principal_univ]
  ext x
  simp
  exact?

which is rewriting using prinipal_univ backward and then excluding it from the simp call.

Patrick Massot (Oct 25 2023 at 17:41):

The goal after simp is univ = {x | x.1 ≀ x.2} βˆͺ {x | x.2 < x.1} which means it is successfully translated to a problem involving Set only.

Kevin Buzzard (Oct 25 2023 at 18:31):

Anatole Dedecker said:

Do we really have no norm on N\mathbb{N} ?!?

No because which prime pp would you choose?

Terence Tao (Oct 25 2023 at 18:32):

OK, I think I can sort of see how to reinterpret all these side conditions on parameters as modifications to the filter, and that in principle one can decouple the manipulation of the primary quantities F and G from manipulations of the filter. My own project is small enough that it is close to the breakeven point where it would be quicker to just open and close the existential quantifiers at each step, but I can believe that the filter-based approach will scale better, particularly with nice simp lemmas and syntactic sugar.

Anyway, thanks for the sample code, I will have to study it more carefully when I have more time.

Anatole Dedecker (Oct 25 2023 at 18:45):

Kevin Buzzard said:

Anatole Dedecker said:

Do we really have no norm on $\mathbb{N}$ ?!?

No because which prime $p$ would you choose?

Well we already chose for Z\mathbb{Z}…

Terence Tao (Oct 25 2023 at 19:34):

Regarding the asymptotic vs non-asymptotic issue. For my application I would like an estimate that holds for all n, but a common technique is to first establish a result for sufficiently large n, and then treat the remaining bounded number of cases by an easier argument. For instance, below is a sketch of a proof that Ξ» n k, Nat.choose n k =O(⊀) Ξ» n k, (2:ℝ)^(n:ℝ) / (n+1 : ℝ)^((2:ℝ)⁻¹) that proceeds by first establishing the claim asymptotically (using Stirling's formula), then treating the bounded cases by cruder methods. As you can see, the proof I have in mind is quite lengthy, mostly due to all the opening and closing of existential quantifiers. If there was some way to do this in a civilized fashion using =O[] notation that would be great. (Also, my proof that n ≀ Nat.succ M β†’ n ≀ M ∨ n = Nat.succ M is embarrassingly long; would be happy to hear of a quicker way. I couldn't get Order.le_succ_iff_eq_or_le to work because lean would not recognize Nat.succ as an instance of Order.succ.)

import Mathlib

def F (n : β„•) (k : β„•) : ℝ := (Nat.choose n k: ℝ)
noncomputable def G (n : β„•) (k : β„•) : ℝ := (2:ℝ)^(n:ℝ) / (n+1 : ℝ)^((2:ℝ)⁻¹)

/-! the objective is to show that F(n,k) = O( G(n,k) ) for all n,k (not just asymptotically. )-/

/-- First, the asymptotic case --/
lemma est1 : βˆƒ N C : β„•, βˆ€ n : β„•, βˆ€ k : β„•, (N < n) β†’ F n k ≀ C * G n k := by
  sorry  -- use Stirling approximation

/-- Then, the base case n = 0 --/
lemma est2 : βˆƒ C : β„•, βˆ€ n : β„•, βˆ€ k : β„•, (n = 0) β†’ F n k ≀ C * G n k := by
  sorry  -- easy computation

/-- Then, the inductive case --/
lemma est3 : βˆ€ n : β„•, (βˆƒ C : β„•, βˆ€ k : β„•, F n k ≀ C * G n k) β†’ (βˆƒ C : β„•, βˆ€ k : β„•, F (Nat.succ n) k ≀ C * G (Nat.succ n) k) := by
  sorry  -- use Pascal identity

lemma pos (n : β„•) (k : β„•) : 0 ≀ G n k := by
  sorry  -- positivity

/-- Finally, the non-asymptotic case -/
example : βˆƒ C : β„•, βˆ€ n : β„•, βˆ€ k : β„•, F n k ≀ C * (G n k) := by
  have h : βˆ€ N : β„•, βˆƒ C : β„•, βˆ€ n : β„•, βˆ€ k : β„•, (n ≀ N) β†’ F n k ≀ C * G n k := by
    intro N
    induction' N with M hM
    . rcases est2 with ⟨ C2, h2 ⟩
      use C2
      intro n k hn
      simp at hn
      exact h2 n k hn
    rcases hM with ⟨ CM, hCM ⟩
    have hCM' : βˆƒ C : β„•, βˆ€ k : β„•, F M k ≀ C * G M k := by
      use CM
      intro k
      exact hCM M k (show M ≀ M by linarith)
    rcases est3 M hCM' with ⟨ CMsucc, hCMsucc ⟩
    use max CM CMsucc
    intro n k hn
    rcases Nat.of_le_succ hn with hnM | hnM
    . apply le_trans (hCM n k hnM) _
      gcongr
      . exact pos n k
      simp
    rw [hnM]
    apply le_trans (hCMsucc k) _
    gcongr
    . exact pos (Nat.succ M) k
    simp
  rcases est1 with ⟨ N, C1, h1 ⟩
  rcases h N with ⟨ C, hC ⟩
  use max C1 C
  intro n k
  rcases (le_or_gt n N) with hnN | hnN
  . apply le_trans (hC n k hnN) _
    gcongr
    . exact pos n k
    simp
  apply le_trans (h1 n k hnN) _
  gcongr
  . exact pos n k
  simp

Arend Mellendijk (Oct 25 2023 at 19:46):

(I think you're looking for docs#Nat.of_le_succ)

Kevin Buzzard (Oct 25 2023 at 20:14):

Here's a golf for hCM':

    have hCM' : βˆƒ C : β„•, βˆ€ k : β„•, F M k ≀ C * G M k := ⟨CM, fun k ↦ hCM M k le_rfl⟩

Patrick Massot (Oct 25 2023 at 20:55):

I really love filters but here I don't think they would make anything easier. Here is a more efficient version of your code, slightly restating your lemmas est2 and est3.

import Mathlib

def F (n : β„•) (k : β„•) : ℝ := (Nat.choose n k: ℝ)
noncomputable def G (n : β„•) (k : β„•) : ℝ := (2:ℝ)^(n:ℝ) / (n+1 : ℝ)^((2:ℝ)⁻¹)

/-! the objective is to show that F(n,k) = O( G(n,k) ) for all n,k (not just asymptotically. )-/

/-- First, the asymptotic case --/
lemma est1 : βˆƒ N C : β„•, βˆ€ n : β„•, βˆ€ k : β„•, (N < n) β†’ F n k ≀ C * G n k := by
  sorry  -- use Stirling approximation

/-- Then, the base case n = 0 --/
lemma est2 : βˆƒ C : β„•, βˆ€ n : β„•, βˆ€ k : β„•, (n = 0) β†’ F n k ≀ C * G n k := by
  sorry  -- easy computation

/-- Then, the base case n = 0 --/
lemma est2' : βˆƒ C : β„•, βˆ€ k : β„•, F 0 k ≀ C * G 0 k := by
  sorry  -- easy computation

/-- Then, the inductive case --/
lemma est3 : βˆ€ n : β„•, (βˆƒ C : β„•, βˆ€ k : β„•, F n k ≀ C * G n k) β†’ (βˆƒ C : β„•, βˆ€ k : β„•, F (Nat.succ n) k ≀ C * G (Nat.succ n) k) := by
  sorry  -- use Pascal identity

/-- Then, the inductive case --/
lemma est3' : βˆ€ n : β„•, βˆ€ C : β„•,
    (βˆ€ k : β„•, F n k ≀ C * G n k) β†’ (βˆƒ C : β„•, βˆ€ k : β„•, F (Nat.succ n) k ≀ C * G (Nat.succ n) k) :=
  fun n C hC ↦ est3 n ⟨C, hC⟩

theorem exists_by_induction {Ξ± : Type _} {P : β„• β†’ Ξ± β†’ Prop} (hβ‚€ : βˆƒ a, P 0 a)
    (ih : βˆ€ n a, P n a β†’ βˆƒ a', P (n + 1) a') : βˆƒ f : β„• β†’ Ξ±, βˆ€ n, P n (f n) := by
  choose fβ‚€ hfβ‚€ using hβ‚€
  choose! F hF using ih
  exact ⟨fun n => Nat.recOn n fβ‚€ F, fun n => Nat.rec hfβ‚€ (fun n ih => hF n _ ih) n⟩

lemma pos (n : β„•) (k : β„•) : 0 ≀ G n k := by
  sorry  -- positivity

open Finset

/-- Finally, the non-asymptotic case -/
example : βˆƒ C : β„•, βˆ€ n : β„•, βˆ€ k : β„•, F n k ≀ C * (G n k) := by
  rcases est1 with ⟨N, C₁, hCβ‚βŸ©
  rcases exists_by_induction est2' est3' with ⟨C, hC⟩
  -- Define `Cβ‚‚` to be the maximum of the `C` function on `{0, ..., N}`.
  let Cβ‚‚ := max' (image C (range (N + 1))) (Finset.nonempty_range_succ.image C)
  use max C₁ Cβ‚‚
  intro n k
  rcases le_or_gt n N with hnN | hnN
  . apply (hC n k).trans
    gcongr
    . exact pos n k
    apply le_max_of_le_right
    exact le_max' (image C (range (N + 1))) (C n) (mem_image_of_mem C (mem_range_succ_iff.2 hnN))
  Β· apply (hC₁ n k hnN).trans
    gcongr
    . exact pos n k
    Β· simp

Patrick Massot (Oct 25 2023 at 21:26):

We can abstract this some more, leading to an existence principle from base case, induction and low values. And then apply it to your example.

import Mathlib

def F (n : β„•) (k : β„•) : ℝ := (Nat.choose n k: ℝ)
noncomputable def G (n : β„•) (k : β„•) : ℝ := (2:ℝ)^(n:ℝ) / (n+1 : ℝ)^((2:ℝ)⁻¹)

/-! the objective is to show that F(n,k) = O( G(n,k) ) for all n,k (not just asymptotically. )-/

/-- First, the asymptotic case --/
lemma est1 : βˆƒ N C : β„•, βˆ€ n : β„•, βˆ€ k : β„•, (N < n) β†’ F n k ≀ C * G n k := by
  sorry  -- use Stirling approximation

/-- Then, the base case n = 0 --/
lemma est2 : βˆƒ C : β„•, βˆ€ n : β„•, βˆ€ k : β„•, (n = 0) β†’ F n k ≀ C * G n k := by
  sorry  -- easy computation

/-- Then, the base case n = 0 --/
lemma est2' : βˆƒ C : β„•, βˆ€ k : β„•, F 0 k ≀ C * G 0 k := by
  sorry  -- easy computation

/-- Then, the inductive case --/
lemma est3 : βˆ€ n : β„•, (βˆƒ C : β„•, βˆ€ k : β„•, F n k ≀ C * G n k) β†’ (βˆƒ C : β„•, βˆ€ k : β„•, F (Nat.succ n) k ≀ C * G (Nat.succ n) k) := by
  sorry  -- use Pascal identity

/-- Then, the inductive case --/
lemma est3' : βˆ€ n : β„•, βˆ€ C : β„•,
    (βˆ€ k : β„•, F n k ≀ C * G n k) β†’ (βˆƒ C : β„•, βˆ€ k : β„•, F (Nat.succ n) k ≀ C * G (Nat.succ n) k) :=
  fun n C hC ↦ est3 n ⟨C, hC⟩

lemma pos (n : β„•) (k : β„•) : 0 ≀ G n k := by
  sorry  -- positivity

theorem exists_by_induction {Ξ± : Type _} {P : β„• β†’ Ξ± β†’ Prop} (hβ‚€ : βˆƒ a, P 0 a)
    (ih : βˆ€ n a, P n a β†’ βˆƒ a', P (n + 1) a') : βˆƒ f : β„• β†’ Ξ±, βˆ€ n, P n (f n) := by
  choose fβ‚€ hfβ‚€ using hβ‚€
  choose! F hF using ih
  exact ⟨fun n => Nat.recOn n fβ‚€ F, fun n => Nat.rec hfβ‚€ (fun n ih => hF n _ ih) n⟩

open Finset

theorem exists_by_induction_and_low_vals {Ξ± : Type _} [LinearOrder Ξ±] {P : β„• β†’ Ξ± β†’ Prop}
    (hβ‚€ : βˆƒ a, P 0 a) (ih : βˆ€ n a, P n a β†’ βˆƒ a', P (n + 1) a')
    {N a₁} (low_vals : βˆ€ n > N, P n a₁) (mono : βˆ€ n, Monotone (P n)) : βˆƒ a, βˆ€ n, P n a := by
  rcases exists_by_induction hβ‚€ ih with ⟨a, ha⟩
  let aβ‚‚ := max' (image a (range (N + 1))) (Finset.nonempty_range_succ.image a)
  use max a₁ aβ‚‚
  intro n
  rcases le_or_gt n N with hnN | hnN
  . exact mono n (le_max_of_le_right <| le_max' (image a (range (N + 1))) (C n)
      (mem_image_of_mem a (mem_range_succ_iff.2 hnN))) (ha n)
  Β· exact mono n (le_max_left a₁ aβ‚‚) (low_vals n hnN)

example : βˆƒ C : β„•, βˆ€ n : β„•, βˆ€ k : β„•, F n k ≀ C * (G n k) := by
  rcases est1 with ⟨N, C₁, hCβ‚βŸ©
  -- The weird `fun n hn k ↦ hC₁ _ _ hn` comes from a discrepancy between the order of quantifiers
  -- In the example we have `βˆ€ n : β„•, βˆ€ k : β„•, (N < n) β†’ F n k ≀ C * G n k` where the abstract lemma
  -- want `βˆ€ n > N, βˆ€ k : β„•, F n k ≀ C * G n k`.
  apply exists_by_induction_and_low_vals est2' est3' (fun n hn k ↦ hC₁ _ _ hn)
  clear N C₁ hC₁
  -- Now prove the monotonicity property
  intro C N n hn hC k
  apply (hC k).trans
  gcongr
  apply pos

Patrick Massot (Oct 25 2023 at 21:28):

Note the slightly unusual assumption Monotone (P n) where P is a predicate on Ξ±. You need to know that Lean endows Prop with the order relation where False < True, so that assumption means that P n a implies P n a' whenever a ≀ a'. This is the monotonicity property proved at the very end when applying the abstract lemma to your example.

Patrick Massot (Oct 25 2023 at 21:32):

Also worth a small note: I could have replaced {N a₁} (low_vals : βˆ€ n > N, P n a₁) in the assumptions of exists_by_induction_and_low_vals by (low_vals : βˆƒ N a₁, βˆ€ n > N, P n a₁). This would be logically equivalent, but then the proof would start with a rcases low_vals with ... which is only noise, and the resulting lemma would also be very slightly less convenient to use.

Kevin Buzzard (Oct 25 2023 at 22:03):

Patrick's approach also shows the general principle that in a theorem prover a good approach to proving theorems is to break them down into several small blocks, rather than just going straight for that main result.

Patrick Massot (Oct 25 2023 at 22:05):

Yes, lemma exists_by_induction_and_low_vals is not something you would state and prove on paper. Doing it in a proof assistant allows decoupling the purely logical structure of the argument with what is specific to your application. We have a lot of those lemmas in formal math.

Patrick Massot (Oct 25 2023 at 22:05):

It helps even if you use it only once.

Yury G. Kudryashov (Oct 25 2023 at 22:08):

I'm trying to write a shorter proof.

Yury G. Kudryashov (Oct 25 2023 at 22:23):

import Mathlib

open Filter Asymptotics

def F (n : β„•) (k : β„•) : ℝ := (Nat.choose n k: ℝ)
noncomputable def G (n : β„•) (_k : β„•) : ℝ := (2:ℝ)^(n:ℝ) / (n+1 : ℝ)^((2:ℝ)⁻¹)

/-! the objective is to show that F(n,k) = O( G(n,k) ) for all n,k (not just asymptotically. )-/

/-- First, the asymptotic case --/
lemma est1 : βˆƒ N C : β„•, βˆ€ n : β„•, βˆ€ k : β„•, (N < n) β†’ F n k ≀ C * G n k := by
  sorry  -- use Stirling approximation

lemma Fnonneg (n k) : 0 ≀ F n k := Nat.cast_nonneg _
lemma Gpos (n k) : 0 < G n k := by unfold G; positivity

example : βˆƒ C : β„•, βˆ€ n : β„•, βˆ€ k : β„•, F n k ≀ C * (G n k) := by
  rcases est1 with ⟨N, C, h⟩
  have hFβ‚€ : βˆ€ n, F n =αΆ [atTop] 0 := fun n ↦ (eventually_gt_atTop n).mono fun k hk ↦ by
    simpa [F, Nat.choose_eq_zero_iff]
  have : βˆ€ n, βˆƒ Cn : β„•, βˆ€ k, F n k ≀ Cn * G n k := fun n ↦ by
    have hFO : βˆ€ n, F n =O[atTop] G n := fun n ↦ (hFβ‚€ n).trans_isBigO (isBigO_zero _ _)
    obtain ⟨Cn, -, hCn⟩ := bound_of_isBigO_nat_atTop (hFO n)
    simp only [Real.norm_of_nonneg, Fnonneg, (Gpos _ _).le] at hCn
    refine ⟨⌈CnβŒ‰β‚Š, fun k ↦ (hCn (Gpos _ _).ne').trans ?_⟩
    gcongr; exacts [(Gpos _ _).le, Nat.le_ceil _]
  choose Cn hCn using this
  have : βˆ€αΆ  D in atTop, C ≀ D ∧ βˆ€ n ≀ N, Cn n ≀ D :=
    (eventually_ge_atTop C).and <| (eventually_all_finite (Set.finite_le_nat N)).2 fun _ _ ↦
      (eventually_ge_atTop _)
  rcases this.exists with ⟨D, hCD, hCnD⟩
  refine ⟨D, fun n k ↦ ?_⟩
  rcases le_or_lt n N with hn | hn
  Β· exact (hCn _ _).trans <| mul_le_mul_of_nonneg_right (Nat.cast_le.2 (hCnD n hn)) (Gpos _ _ ).le
  Β· exact (h n k hn).trans <| mul_le_mul_of_nonneg_right (Nat.cast_le.2 hCD) (Gpos _ _).le

Yury G. Kudryashov (Oct 25 2023 at 23:47):

Or this:

import Mathlib

open Filter Asymptotics Set

def F (n : β„•) (k : β„•) : ℝ := (Nat.choose n k: ℝ)
noncomputable def G (n : β„•) (_k : β„•) : ℝ := (2:ℝ)^(n:ℝ) / (n+1 : ℝ)^((2:ℝ)⁻¹)

/-! the objective is to show that F(n,k) = O( G(n,k) ) for all n,k (not just asymptotically. )-/

/-- First, the asymptotic case --/
lemma est1 : βˆƒ N C : β„•, βˆ€ n : β„•, βˆ€ k : β„•, (N < n) β†’ F n k ≀ C * G n k := by
  sorry  -- use Stirling approximation

lemma Fnonneg (n k) : 0 ≀ F n k := Nat.cast_nonneg _
lemma Gpos (n k) : 0 < G n k := by unfold G; positivity

lemma eventually_forall_iff_cofinite {Ξ± Ξ² : Type*} {r : Ξ± β†’ Ξ² β†’ Prop} {l : Filter Ξ±} :
    (βˆ€αΆ  x in l, βˆ€ y, r x y) ↔ (βˆ€αΆ  p in l Γ—Λ’ cofinite, r p.1 p.2) ∧ βˆ€ y, βˆ€αΆ  x in l, r x y := by
  refine ⟨fun h ↦ ⟨(h.prod_inl _).mono fun p hp ↦ hp _, fun y ↦ h.mono fun x hx ↦ hx y⟩, ?_⟩
  simp only [(l.basis_sets.prod hasBasis_cofinite).eventually_iff]
  rintro ⟨⟨⟨s, t⟩, ⟨hsl : s ∈ l, ht : t.Finite⟩, hβ‚βŸ©, hβ‚‚βŸ©
  have : βˆ€αΆ  x in l, βˆ€ y ∈ t, r x y := (eventually_all_finite ht).2 fun _ _ ↦ hβ‚‚ _
  filter_upwards [hsl, this] with x hxs hxt y
  by_cases hy : y ∈ t
  exacts [hxt y hy, h₁ (Set.mk_mem_prod hxs hy)]

lemma eventually_forall_nat_iff_atTop {Ξ± : Type*} {r : Ξ± β†’ β„• β†’ Prop} {l : Filter Ξ±} :
    (βˆ€αΆ  x in l, βˆ€ y, r x y) ↔ (βˆ€αΆ  p in l Γ—Λ’ atTop, r p.1 p.2) ∧ βˆ€ y, βˆ€αΆ  x in l, r x y := by
  rw [eventually_forall_iff_cofinite, Nat.cofinite_eq_atTop]

example : βˆƒ C : β„•, βˆ€ n : β„•, βˆ€ k : β„•, F n k ≀ C * (G n k) := by
  suffices βˆ€αΆ  C : β„• in atTop, βˆ€ n k : β„•, F n k ≀  C * G n k from this.exists
  simp only [eventually_forall_nat_iff_atTop (Ξ± := β„•)]
  refine ⟨?_, fun n ↦ ⟨?_, fun k ↦ ?_⟩⟩
  · rcases est1 with ⟨N, C, hNC⟩
    refine ((eventually_ge_atTop C).prod_mk (eventually_gt_atTop N)).mono fun _ h _ ↦ ?_
    exact (hNC _ _ h.2).trans <| mul_le_mul_of_nonneg_right (Nat.cast_le.2 h.1) (Gpos _ _).le
  Β· refine ((eventually_gt_atTop n).prod_inr _).mono fun _ h ↦ ?_
    simp [F, Nat.choose_eq_zero_iff.2 h, Gpos]
  Β· exact (tendsto_nat_cast_atTop_atTop.atTop_mul_const (Gpos _ _)).eventually_ge_atTop _

Patrick Massot (Oct 25 2023 at 23:51):

This is nice but I hope my version is closer to being a general explanation instead of focusing on this specific example.

Patrick Massot (Oct 25 2023 at 23:52):

I was commenting the first version, I didn't see the second one which looks more promising.

Yury G. Kudryashov (Oct 25 2023 at 23:53):

It depends on what do you know for each particular n.

Yury G. Kudryashov (Oct 25 2023 at 23:53):

If the best you can do is a proof by induction, then you're right.

Yury G. Kudryashov (Oct 25 2023 at 23:54):

If you can prove that for each n there is an estimate (e.g., because f n =O[atTop] g n), then my version works fine.

Terence Tao (Oct 26 2023 at 02:32):

This is a neat approach, but different from the filter-based approach I was expecting. My vague idea was to interpret est1 as a F =O[l] G assertion for the filter corresponding to large n, est2 as a similar assertion for the principal filter for {0}, est3 as an inductive assertion that lets us control the principal filter for {n} for any n, and then one would conclude from the general properties of filters and their interaction with O() notation. In particular it would be compatible with the filter approach to the previous example (in my actual application, the asymptotic case when n is large is dealt with by a further splitting into cases similar to n ≀ k and k < n in the toy example).

Lars Ericson (Oct 27 2023 at 02:04):

I'm just curious, are the above definitions of f=O(g)f = O(g) consistent with the Wikipedia definition for big O notation, namely:

Let ff, the function to be estimated, be a real or complex valued function and let gg, the comparison function, be a real valued function. Let both functions be defined on some unbounded subset of the positive real numbers, and g(x)g(x) be strictly positive for all large enough values of xx. Then

f(x)=O(g(x))Β asΒ xβ†’βˆžf(x) = O\bigl( g(x)\bigr)\quad\text{ as }x\to\infty

if there exists a positive real number MM and a real number x0x_0 such that

∣f(x)βˆ£β‰€Mg(x)Β forΒ allΒ xβ‰₯x0|f(x)| \le M g(x) \quad \text{ for all } x \ge x_0

Jireh Loreaux (Oct 27 2023 at 02:33):

docs#IsBigOWith doesn't require g to be eventually positive, but we take the norm of g(x) instead.

Jireh Loreaux (Oct 27 2023 at 02:33):

So it's effectively the same

Antoine Chambert-Loir (Oct 27 2023 at 06:14):

Kevin Buzzard said:

Anatole Dedecker said:

Do we really have no norm on $\mathbb{N}$ ?!?

No because which prime $p$ would you choose?

What about the trivial nonarchimedean norm?

Yury G. Kudryashov (Nov 12 2023 at 07:20):

In #8362 I define some predicates (ReflectsGrowth and a few more restrictive versions). What do you think about it?

Yury G. Kudryashov (Nov 12 2023 at 07:21):

Some examples are near the end of the ReflectsGrowth file.

Yury G. Kudryashov (Nov 12 2023 at 07:22):

With these predicates, we can have combinators like mul_lex (to construct terms like x ^ r * exp (a * x)) and lemmas like "a sum of these terms is equivalent to its leading term".

Yury G. Kudryashov (Nov 12 2023 at 07:24):

The reason why I don't assume g = id is that for a complex exponential exp (a * z) (as re z β†’ ∞ and im z is not too large), the right function is re a.

Terence Tao (Dec 18 2023 at 21:42):

I'm using this (dormant) thread to record one possible way to work with asymptotic notation in Lean "like an analyst" that might not be too unwieldy, inspired by an approach we came up with in the PFR project to encode various ambient parameters. Namely, to create a structure (we called it refPackage) of various ambient parameters and the hypotheses placed upon them. In the case of asymptotic notation, all of the implied asymptotic constants C₁, Cβ‚‚, C₃, etc. could be viewed as components of the refPackage, and a single object p: refPackage can be passed to all of the lemmas in the argument, in order to make these constants available as needed (under names such as p.C₁, p.Cβ‚‚, p.C₃, etc.). (In some cases these constants will not be absolute, but will be functions of other parameters, but this is easily handled in Lean by selecting appropriate types for each constant.) To get from one asymptotic constant to another, one needs various relations between these constants (e.g., C₃ β‰₯ Cβ‚‚ + C₁), instances of which one simply adds as additional components to the refPackage as needed, without disrupting the rest of the arguments with additional existential quantifiers etc.. Only at the very end of the proof will one need to show that at least one refPackage exists, i.e., that there exists a choice of parameters that obeys all the required hypotheses, and this can be done in one go (with some giant constructor or something). The point though is that one can defer having to deal with this until all the rest of the proof is stable. In the future one could even imagine a tactic (sort of analogous to filter_upwards) that could automatically break up the verification of existence of a refPackage into a sequence of small existential claims (e.g., given C₁ and Cβ‚‚, that C₃ β‰₯ Cβ‚‚ + C₁ for all sufficiently large C₃) that would make even this last step less tedious (and easier to maintain if the main body of the argument changes).

Patrick Massot (Dec 18 2023 at 21:55):

We already know what to do in this direction, see here.

Terence Tao (Dec 18 2023 at 22:08):

Great, having direct support for procrastinated side conditions would work nicely, but if we are still waiting for that to be deployed, bundling all these conditions into a single package seems to be a feasible workaround.

Though in the absence of an explicit object to hold all the procrastinated conditions, I'm not sure how one would implement procrastination if the argument is split up across multiple lemmas, each with their own separate goal and tactic state.


Last updated: Dec 20 2023 at 11:08 UTC