Zulip Chat Archive

Stream: PhysLean

Topic: Computing `gradient`


Tomas Skrivan (Jun 22 2025 at 14:05):

IRight now, it is quite difficult computing gradient f as the API in mathlib is not well developed. I think gradient is not even the good primitive to work with. I made a PR622 that defines

   adjFDeriv π•œ f x = (fderiv π•œ f x).adjoint

then we have gradient f x = adjFDeriv π•œ f x 1 and working with adjFDeriv is easier as we can formulate composition theorem

theorem adjFDeriv_comp {f : F β†’ G} {g : E β†’ F} {x : E}
    (hf : DifferentiableAt π•œ f (g x)) (hg : DifferentiableAt π•œ g x) :
    adjFDeriv π•œ (fun x => f (g x)) x = fun dy => adjFDeriv π•œ g x (adjFDeriv π•œ f (g x) dy)

One difficulty when working with gradient/adjFDeriv is thatΒ XΓ—YΒ is not an inner product space so we have to work withΒ WithLp 2 (XΓ—Y)Β which does not have as well developed API. Part of the PR is to defineΒ ProdL2 X Y := WithLp 2 (XΓ—Y)Β and corresponding API to make our life easier.

The PR is just a draft and there a bunch of sorry, any help would be appreciated.
@ZhiKai Pong I think this PR would help you to do bunch of calculations you have been doing.

Tomas Skrivan (Jun 22 2025 at 14:09):

This PR is also necessary to derive Euler-Lagrange equations by using variational calculus.

ZhiKai Pong (Jun 22 2025 at 14:48):

I have been quite busy recently and haven't been following very closely on what you and Joseph are working on, but many thanks for those and I'll definitely go through them when I have time :)

Joseph Tooby-Smith (Jun 22 2025 at 19:17):

Great! Somewhat related, I made PR620 a couple of days ago which relates Space.grad to gradient.

Tomas Skrivan (Jun 22 2025 at 22:35):

I have significantly reworked how adjoint works. I really hate the situation with XΓ—Y vs WithLp 2 (XΓ—Y) so instead I have introduced the class

class InnerProductSpace' (π•œ : Type*) (E : Type*) [RCLike π•œ] [NormedAddCommGroup E] [NormedSpace π•œ E]
    extends Normβ‚‚ E where
  core : InnerProductSpace.Core π•œ E
  /-- Norm induced by inner is topologicaly equivalent to the given norm -/
  inner_top_equiv_norm : βˆƒ c d : ℝ,
    c > 0 ∧ d > 0 ∧
    βˆ€ x : E, (c β€’ β€–xβ€–^2 ≀ re (core.inner x x)) ∧ (re (core.inner x x) ≀ d β€’ β€–xβ€–^2)

which attaches inner product to normed space and requires that the norm induced by the inner product is topologically equivalent to the existing norm on the space. This way we can have inner product on XΓ—Y and not faff around with WithLp. For example, for f : ℝ β†’ ℝ β†’ ℝ we can write adjFDeriv ℝ β†Ώf and Lean will not complain that the is no instance InnerProductSpace π•œ (ℝ×ℝ) as it does when we write gradient β†Ώf.

The only sorries are related to the topological equivalence between X equipped with its norm topology and the topology induced by the inner product.

Joseph Tooby-Smith (Jun 23 2025 at 05:32):

How should I think of the difference between InnerProductSpace' and InnerProductSpace?
Is one more general than the other, or are they secretly equivalent?

Tomas Skrivan (Jun 23 2025 at 06:32):

I added doc string to InnerProductSpace'


Effectively as InnerProductSpace π•œ E but it does not requires that β€–xβ€–^2 = βŸͺx,x⟫. It is
only required that they are equivalent βˆƒ (c > 0) (d > 0), c β€’ β€–xβ€–^2 ≀ βŸͺx,x⟫ ≀ d β€’ β€–xβ€–^2. The main
purpose of this class is to provide inner product space structure on product types ExF and
pi types ΞΉ β†’ E without using WithLp gadget.

If you want to access Lβ‚‚ norm use β€–xβ€–β‚‚ := √βŸͺx,x⟫.

This class induces InnerProductSpace π•œ (WithLp 2 E) which equips β€–Β·β€– on X with Lβ‚‚ norm.
This is very useful when translating results from InnerProductSpace to InnerProductSpace'
together with toL2 : E β†’L[π•œ] (WithLp 2 E) and fromL2 : (WithL2 2 E) β†’L[π•œ] E.

In short we have these implications:

  InnerProductSpace π•œ E β†’ InnerProductSpace' π•œ E
  InnerProductSpace' π•œ E β†’ InnerProductSpace π•œ (WithLp 2 E)

The reason behind this type class is that with current mathlib design the requirement
β€–xβ€–^2 = βŸͺx,x⟫ prevents us to give inner product space structure on product type EΓ—F and pi
type ΞΉ β†’ E as they are equipped with max norm. One has to work with WithLp 2 (EΓ—F) and
WithLp 2 (ΞΉ β†’ E). This places quite a bit inconvenience on users in certain scenarios,
in particular the main motivation behind this class is to make computations of adjFDeriv and
gradient easy.

Joseph Tooby-Smith (Jun 23 2025 at 06:50):

This makes sense! Thanks for the explanation.

Tomas Skrivan (Jun 23 2025 at 06:56):

I really tried making adjFDeriv working with WithLp but it was really painful and I was constantly accidentally causing defeq abuse between X⨯Y and WithLp 2 (X×Y). If I can't get it right then there is no hope for beginners to get it right.

Tomas Skrivan (Jun 23 2025 at 06:59):

I would like to have InnerProductSpace' in mathlib and ideally InnerProductSpace just extends it and most of the results are formulated in terms of InnerProductSpace'. However, when I talked about it I did not get any meaningful response to the proposed change.

Joseph Tooby-Smith (Jun 23 2025 at 07:13):

Yeah, I agree with the philosophy here. Anything that avoids defeq abuse or having to worry about things being the correct type, I think is a massive benefits to usability - and is a massive turn-off using Lean when it's not done right.

Joseph Tooby-Smith (Jun 23 2025 at 07:15):

I think this is especially true for physicists where the tolerance for dealing with this type of thing is much lower than it is for mathematicians.

Tomas Skrivan (Jun 23 2025 at 07:21):

Yeah, I feel that sometimes complains like that falls on deaf ears as people actually deciding on changing mathlib got used to lots of the idiosyncrasies.

Kevin Buzzard (Jun 23 2025 at 07:38):

My reading of the conversation about inner products was that Tomas said "for what I'm doing the following set up would be good" and the implicit response was "for other projects doing different things this wouldn't be good". But I might be misreading. Tomas never wants this max distance on a product but other people do

Kevin Buzzard (Jun 23 2025 at 07:39):

Maybe it's similar to how Tomas doesn't want to bundle things like group homomorphisms because they're inconvenient for FunProp but people doing other things think they're great.

I think it's basically impossible to make one system which is perfect for everyone

Tomas Skrivan (Jun 23 2025 at 07:54):

Kevin Buzzard said:

I think it's basically impossible to make one system which is perfect for everyone

Yeah I understand that and I use mathlib/Lean in a quite different way than most people so it makes sense I don't get my way and I try my best to find workarounds.

Kevin Buzzard (Jun 23 2025 at 07:58):

I agree that the inner product conversation just petered out though rather than people having a sensible discussion. Is there a way of taking it further e.g. making a PR or would this be very time-consuming?

Tomas Skrivan (Jun 23 2025 at 08:06):

I will draft a PR to mathlib if I find the time. The core of it will be this PR to PhysLean but I'm not sure about the downstream effects/generalizations in mathlib.

Dominic Steinitz (Jun 23 2025 at 13:25):

In my head, gradients are the differential 1-form of a 0-form. I thought someone was working on differential forms but maybe both my premises are wrong.

Tomas Skrivan (Jun 23 2025 at 13:27):

This you get already with fderiv, for f : X β†’ ℝ you have (fderiv ℝ f x) : X β†’L[ℝ] ℝ which is effectively 1-form.

Tomas Skrivan (Jun 23 2025 at 13:27):

The goal of gradient is also include the identification (X β†’L[ℝ] ℝ) ≃ X given by inner product.

Tomas Skrivan (Jun 23 2025 at 13:30):

For example this expression often encountered in optimization x + s β€’ βˆ‡ f x type checks only if βˆ‡ f x has the same type as x thus it can't be 1-form.

Same for ODE αΊ‹ = βˆ‡ f x. It makes sense only if x has the same type as βˆ‡ f x

Dominic Steinitz (Jun 23 2025 at 13:39):

Do we not normally do this via a Riemannian metric? But I know they don't quite exist in Mathlib yet.

Tomas Skrivan (Jun 23 2025 at 13:42):

Well I'm doing everything only on vector spaces, so it needs inner product structure. But yes, on manifolds you need Riemannian metric.


Last updated: Dec 20 2025 at 21:32 UTC