Zulip Chat Archive

Stream: mathlib4

Topic: define gradient of scalar-valued function on a Hilbert space


Ziyu Wang (Sep 11 2023 at 01:10):

Hello, @Chenyi Li and I are the authors of the pull requese #7074( feat: define gradient of scalar-valued function on a Hilbert space). We have noticed the reviewer's comments on GitHub and greatly appreciated the reviewer's efforts.
We are working on formalizing Computational mathematics and starting with Convex Optimization.
In Computational mathematics, we mainly focused on functions from R^n to R, for which the gradient might offer a more concise way of representation.
Also, in practical and physical applications, the computation of the gradient is often simpler and more efficient, especially for common numerical methods.
Given the direct significance of the gradient in many applications (such as multivariate function ODEs, g numerical algebra, numerical analysis, the theory of machine learning and deep learning, optimization algorithms, etc.), introducing it in the library could make the library more appealing and practical.
So, we think the introduction of Gradient is significant and will address the issue raised by the reviewer. After successfully uploading, we will subsequently upload the formalizations of the theorems related to the first-order conditions of convex functions (strongly convex functions) and the convergence rate of Gradient Descent Method

Chenyi Li (Sep 11 2023 at 01:15):

Hello, @Ziyu Wang and I are working on the pull requese #7074( feat: define gradient of scalar-valued function on a Hilbert space). When examining optimization methods such as gradient descent or stochastic gradient descent, one must directly compute xαf(x)x−\alpha \nabla f(x). In numerous contexts, additional calculations utilizing the derivative information become indispensable. Utilizing only 'fderiv' poses challenges in handling the subtraction of a vector x from a continuous linear map in Rn\mathbb{R}^n. This underscores the imperative need to integrate a gradient definition into mathlib.

Heather Macbeth (Sep 11 2023 at 01:18):

Thank you for starting this conversation, @Ziyu Wang and @Chenyi Li. I am the reviewer who suggested discussing whether a separate theory of gradients should be added. Let me ping some authors and users of the multivariable calculus library: @Anatole Dedecker @Moritz Doll @Sebastien Gouezel @Patrick Massot @Oliver Nash @Yury G. Kudryashov

Moritz Doll (Sep 11 2023 at 02:01):

Chenyi Li said:

Hello, Ziyu Wang and I are working on the pull requese #7074( feat: define gradient of scalar-valued function on a Hilbert space). When examining optimization methods such as gradient descent or stochastic gradient descent, one must directly compute xαf(x)x−\alpha \nabla f(x). In numerous contexts, additional calculations utilizing the derivative information become indispensable. Utilizing only 'fderiv' poses challenges in handling the subtraction of a vector x from a continuous linear map in Rn\mathbb{R}^n. This underscores the imperative need to integrate a gradient definition into mathlib.

I think what you want there is innerSL k x - α \smul fderiv k f x.

Patrick Massot (Sep 11 2023 at 02:10):

Moritz, the issue is they want the result to have the same type as x if this meant as a step in a gradient descent algorithm. Of course you could apply the isomorphism from the dual to the original space. This would be a bit cumbersome, but maybe less than duplicating so many lemmas about differentials.

Patrick Massot (Sep 11 2023 at 02:12):

Or simply we could inline the definition and see how many lemmas we need to actually prove some numerical analysis theorem.

Heather Macbeth (Sep 11 2023 at 02:16):

I'm torn here.

An argument in favour: we have deriv, a special-casing-for-convenience of fderiv when the domain is a scalar, so why not also add a special-casing-for-convenience of fderiv when the codomain is a scalar?

An argument against: it's tedious work to build out a full theory in which all actual mathematical content is obtained by special-casing more general results, and may be particularly frustrating for authors who are new mathlib contributors and have to learn all our conventions by trial and error over the course of the review.

Yury G. Kudryashov (Sep 11 2023 at 04:38):

I think that it makes sense to have grad as a separate definition.

Sebastien Gouezel (Sep 11 2023 at 05:19):

Given how useful the gradient is in many situations, I have no doubt we should have it as a first class citizen, roughly on par with deriv. In particular, we should have all the API like HasGradientWithinAt and so on. On the other hand, it should definitely not be defined just by itself, but in terms of fderiv and the canonical isomorphism between the space and its dual. In this way, most results in the API should derive directly from the corresponding fderiv lemmas.

Sebastien Gouezel (Sep 11 2023 at 05:25):

And it's true that it's a difficult PR for newcomers!

Sebastien Gouezel (Sep 11 2023 at 06:27):

A good way to do it would probably be to start over from the file Deriv.lean and adapt it, replacing Deriv with Gradient and deriv with grad in most statements.

Oliver Nash (Sep 11 2023 at 09:57):

I also agree it's worth having this special case of fderiv and I especially agree with the suggestion to model the new API on the relevant parts of the Deriv.lean.

Oliver Nash (Sep 11 2023 at 09:58):

The only unanswered question I see is whether we should specialise to the case that the domain is an inner-product space so that the gradient can be defined to take values in F rather than NormedSpace.Dual 𝕜 F? I would say "yes" (this is what the PR does).

Oliver Nash (Sep 11 2023 at 10:07):

So then I suppose a possible start might be:

import Mathlib.Analysis.InnerProductSpace.Dual
import Mathlib.Analysis.Calculus.FDeriv.Basic

noncomputable section

open Topology

variable {𝕜 F : Type*} [IsROrC 𝕜] [NormedAddCommGroup F] [InnerProductSpace 𝕜 F] [CompleteSpace F]
  (f : F  𝕜) (f' : F)

-- Optionally one could use `InnerProductSpace.toDualMap` and not require `CompleteSpace F` till
-- later but I doubt there's any point.
def HasGradientAtFilter (x : F) (L : Filter F) :=
  HasFDerivAtFilter f (InnerProductSpace.toDual 𝕜 F f') x L

def HasGradientWithinAt (s : Set F) (x : F) :=
  HasGradientAtFilter f f' x (𝓝[s] x)

def HasGradientAt (x : F) :=
  HasGradientAtFilter f f' x (𝓝 x)

-- Probably don't want this because we're over `ℝ` or `ℂ`.
-- def HasStrictGradientAt (x : F) :=
--   HasStrictFDerivAt f (InnerProductSpace.toDual 𝕜 F f') x

def gradient (f : F  𝕜) (x : F) : F := (InnerProductSpace.toDual 𝕜 F).symm (fderiv 𝕜 f x)

-- TODO Lots of API

Anatole Dedecker (Sep 11 2023 at 12:14):

Even though I'm the first one to annoy people by using Fréchet derivatives instead of gradients, I think it makes sense to have it in mathlib. The example of gradient descent is quite convincing, and I'm pretty sure I had thoughts of a few cases where it does indeed make our life easier. The only one I can think of right now is defining the Laplacian in a way that is obviously invariant under change of orthonormal basis, by Δf(x)=TrDx(gradf)\Delta f(x) = \mathrm{Tr} D_x(\mathrm{grad} f) (aka divergence of gradient).

Anatole Dedecker (Sep 11 2023 at 12:23):

Regarding the actual implementation, the thing I'm not sure about is how much duplication do we need. In particular, I don't know if we really want HasGradientAt instead of just gradient (or grad, I don't care about the name). If we do want the predicate, I think we might as well use docs#InnerProductSpace.toDualMap instead of docs#InnerProductSpace.toDual because it's technically more general (although arguably we may not care) and it doesn't cost anything.

Ziyu Wang (Sep 11 2023 at 17:06):

@Oliver Nash ,we agree with your perspective. Currently, it seems essential to formalize the gradient. However, we might need to modify our format, mirroring the deriv.lean file by defining HasGradientWithinAt, HasGradientAtFilter, HasGradientAt, and gradient. We plan to revise our code this week, but there may be some improvements in naming, such as changing gradient to . Furthermore, students participating in our computational mathematics formalization group hope that our work of translating the language into an \(\epsilon-\delta\) language will be preserved. Many scenarios in computational mathematics and most of the existing proofs are based on the \(\epsilon-\delta\) language. Using Filter for formalizing computational mathematics, especially for beginners, is out of sync with mainstream non-formalized textbooks, which can pose some obstacles in use.

Eric Wieser (Sep 11 2023 at 17:13):

I think keeping the epsilon-delta changes (assuming you're using them as hypothesis and in definitions) will be a rather harder sell I'm afraid, though im not the best person to explain why

Ziyu Wang (Sep 11 2023 at 17:19):

The main reason is that most students who have undergone mathematical analysis and computational mathematics training have not been exposed to Filter. We merely hope to retain the interfaces of a few theorems for subsequent formalization. We will adhere to Filter for refining the definition of the gradient itself. However, most of the subsequent theorems in computational mathematics are essentially defined as vector-valued functions from Rn\mathbb{R}^n to R \mathbb{R} and the epsilon-delta language is sufficient for these purposes.

Jireh Loreaux (Sep 11 2023 at 17:46):

sufficient ≠ easiest to use. In general, I would advocate for sticking with Filter language everywhere, and only going to ε and δ within proofs when necessary.

Kevin Buzzard (Sep 11 2023 at 17:51):

I should say that I develop real analysis with epsilons and deltas very early on my course, but my course is independent of mathlib (it uses mathlib, but in this part of the course I am just teaching them basic tactics). If the goal is to extend mathlib then you should do things the mathlib way, there's no point fighting it.

Jireh Loreaux (Sep 11 2023 at 17:57):

But Kevin, you never do one-sided limits I guess? Otherwise combining all the different notions of convergence is absolutely hellish.

Kevin Buzzard (Sep 11 2023 at 17:59):

yeah this is just a warm-up: I get them to prove limit of sum is sum of limits and limit of product is product of limits, and at the end of it they've seen specialize and ring and linarith etc etc; we then go on to group theory.

Kevin Buzzard (Sep 11 2023 at 17:59):

Crucially, we then throw this code away and never use it again.

Anatole Dedecker (Sep 11 2023 at 20:21):

I definitely agree with everyone here that you should state everything using the filter API. But in principle you shouldn't have to know much about filters for this to work. You can just think of filters as "a way something can converge" and docs#Tendsto as "when x converges in some way, f(x) converges in some other way". More importantly, if you need to translate from filters to epsilon/deltas, we should have the necessary API (by which I mean we probably don't have it but it has its place in mathlib).
Most importantly, do not hesitate to ask for help on this, there are plenty of people who will be willing to help!

Jireh Loreaux (Sep 11 2023 at 20:38):

Mostly agreed, but I'm betting we have most of the necessary API already. I did this as an exercise with my students at one point (taking statements in filter language and showing they were equivalent using the library to the concrete (i.e., ε and δ) definition, and we didn't really run into any problems.

Yury G. Kudryashov (Sep 11 2023 at 23:15):

BTW, I think that we don't really need docs#HasFDerivAtFilter, just docs#HasFDerivWithinAt and HasFDerivAt f f' x := HasFDerivWithinAt f f' univ x.

Yury G. Kudryashov (Sep 11 2023 at 23:16):

At least, we never use HasFDerivAtFilter with a filter other than nhds or nhdsWithin.

Yury G. Kudryashov (Sep 11 2023 at 23:17):

Or do we (plan to) use it with something like 𝓝 x ⊓ μ.ae?

Ziyu Wang (Sep 25 2023 at 08:31):

Hello everyone, we have revised our code and created the files Gradient.Basic and Gradient.Calculus. Basic includes definitions of gradient (HasGradientAtFilter, HasGradientWithinAt, HasGradientAt, Gradient), its relationship with Fderiv and Deriv, its relationship with continuity, conversion to the little o notation, and so on. Calculus mainly covers the chain rule for scalar functions composed of vector functions, gradient addition, multiplication, finite addition, finite multiplication, and more. We recently hope to cultivate more students at PKU to engage in the formalization of computational mathematics. Therefore, we hope to upload the gradient-related content to Mathlib soon and look forward to the reviews from experts.

Ziyu Wang (Sep 26 2023 at 15:05):

So what should we do next to upload our code? We have revised the question in #7074

Sebastien Gouezel (Sep 26 2023 at 15:26):

As I asked on the PR, could you resolve all conversations which you think are resolved? Otherwise, we can't know if there are still pending issues.

Jireh Loreaux (Sep 26 2023 at 15:29):

To be clear: Sebastien means literally "click the 'resolve' button", after you have addressed the issue.

Eric Wieser (Sep 26 2023 at 15:36):

If the resolution isn't trivial, sometimes it's better to reply to the threads with something like "I think I resolved this by doing X, please check it looks ok"

Ziyu Wang (Sep 26 2023 at 15:42):

Sorry that this is our first time to upload something to Mathlib. We clicked the 'resolve button' just now.

Sebastien Gouezel (Sep 26 2023 at 16:52):

I have pushed a review.

Ziyu Wang (Sep 26 2023 at 16:54):

Thanks, we will correct our file tomorrow

Ziyu Wang (Oct 13 2023 at 13:01):

We have revised our code last week, is there any expert who can take a look at it on GitHub?


Last updated: Dec 20 2023 at 11:08 UTC